Diving into the embarrassing engineering behind CrowdStrike

  Рет қаралды 73,800

Theo - t3․gg

Theo - t3․gg

Күн бұрын

Пікірлер: 635
@LadyEmilyNyx
@LadyEmilyNyx 5 ай бұрын
This is one of those times when if you wrote it in a movie, everyone would say "That's so unrealistic. No company could be THAT incompetent without going out of business."
@andrasfogarasi5014
@andrasfogarasi5014 5 ай бұрын
Well, any business can demonstrate incompetence at least once before failing.
@pyrouscomments
@pyrouscomments 5 ай бұрын
We don't know that they won't, though.
@akam9919
@akam9919 5 ай бұрын
...well, this is crowdstrike going out of business.
@RipVanFish09
@RipVanFish09 5 ай бұрын
I hope they go out of business because clearly they can’t be trusted. (Also this has happened twice before, just on operating systems that the entire world doesn’t run on) the amount of negligence is dumbfounding.
@2rx_bni
@2rx_bni 5 ай бұрын
Real life is so much wackier than fiction.
@kevinletterer4171
@kevinletterer4171 5 ай бұрын
The 10 dollar gift card is a trap, if you accept it then they can claim that they have already compensated you for any damages and you agreed to it.
@ryanquinn1257
@ryanquinn1257 5 ай бұрын
My thoughts exactly. Had this happen MANY times from a brief summer into Xmas at Costco. They’d offer some compensation usually some 15-50$ gift cards, but if you had accepted those you weren’t part of the class action for those wrong doings later. Same for insurance company’s offering a lowball, but you get money now amount vs waiting and making sure you’re all good physically after a crash.
@earchinmc1080
@earchinmc1080 5 ай бұрын
they did not offer 10 dollar gift cards to their customers (aka the people affected). They were handing them out to sellers of the software
@sad_man_no_talent
@sad_man_no_talent 5 ай бұрын
but that's 10 dollar man u gotta understand
@klam77
@klam77 5 ай бұрын
shysters! What a slick move! OMG. S/W TORT NOW.
@SammmN
@SammmN 5 ай бұрын
The $10 is just insulting. I was in the fire department when it happened and NOTHING was working. The firefighters had to use the radios instead of computers. My friend was working in the hospital and computers were down.
@arcanernz
@arcanernz 5 ай бұрын
It’s not crowdstrike’s fault the update worked on Dan’s computer; the millions of other computers were just edge cases.
@autohmae
@autohmae 5 ай бұрын
If they haven't signed the 'content', this is probably the worst failure of all, for a _security_ company, because they are creating a HUGE security issue.
@desertdude540
@desertdude540 5 ай бұрын
The craziest thing is that the CEO of Clownstrike was CTO at McAfee when they pushed the update that thought svchost.exe was malware and nuked millions of Windows XP installations.
@JoyceMuller-xv6kh
@JoyceMuller-xv6kh 5 ай бұрын
This global internet outage is insane! All airlines grounded and i was stock the airport and even banks, media, and offices from the U.S. to Australia. How can CrowdStrike have such a monopoly that could help restore such a massive amount of tech?
@JackMyers-br2vi
@JackMyers-br2vi 5 ай бұрын
It's pretty concerning. If they can fix this, what other control do they have over our infrastructure? or are we truly in the matrix?
@MattMiller211
@MattMiller211 5 ай бұрын
Right? It makes you think about the stability of our systems. But hey, I barely spend time online. When I checked my portfolio with Desiree Ruth Hoffman, we were still in the greens. That’s been the case for 16 months straight!
@AlexYoung-21
@AlexYoung-21 5 ай бұрын
Wow, really? I've seen the name Desiree Ruth Hoffman before but can't figure out where.
@JackMyers-br2vi
@JackMyers-br2vi 5 ай бұрын
Probably from her forecast on Nvidia before the pump. But how are you in the greens with all the fluctuations due to the election and everything else? Can you share her strategy?
@MattMiller211
@MattMiller211 5 ай бұрын
Honestly, just schedule a call with her. She has vast knowledge in finance and really knows how to navigate these times. I handed over my portfolio to her so I can focus on my family. These days, things just get scarier and scarier.
@PatNeedhamUSA
@PatNeedhamUSA 5 ай бұрын
I find it kind of hilarious that a third or half of their PIR document is marketing material
@jbreckmckye
@jbreckmckye 5 ай бұрын
And on the side is a huge banner reading SEE CROWDSTRIKE© FALCON® IN ACTION!
@stage6fan475
@stage6fan475 5 ай бұрын
This company is named 'Clownstrike' for the rest of time.
@GreyDeathVaccine
@GreyDeathVaccine 5 ай бұрын
Yep, until until the Sun swells and swallows the Earth.
@theapexsurvivor9538
@theapexsurvivor9538 5 ай бұрын
I mean, I'd almost argue that they earned cloudstrike, as it's basically a cloud software and it was more effective than a LOIC...
@ManaVkum-w9b
@ManaVkum-w9b 5 ай бұрын
One of the most comprehensive and detailed explanation of the shit that hit the fan...in interesting and funny manner with enough dash of frustration. This incidents proves we are living at brink...and domino effect can happen ANY time.
@DuraanAli
@DuraanAli 5 ай бұрын
I have a small software business, it's not even my full time job and I don't do updates unless it goes through unit tests, staging and behavior monitoring with applications like Sentry and Hotjar! It's so weird to see a huge company like that just pushing updates even if they're urgent.
@hugazo
@hugazo 5 ай бұрын
So tldr: so many avoidable errors, that exploded in the most spectacular way
@kyle7023
@kyle7023 5 ай бұрын
"Disasters don't just happen, they're a chain of critical events"
@stevengill1736
@stevengill1736 5 ай бұрын
The holes in the Swiss cheese line up. ;*[}
@Canleaf08
@Canleaf08 5 ай бұрын
@@stevengill1736Mentour Pilot vibes.
@robotorfeed6806
@robotorfeed6806 5 ай бұрын
painfully reletable
@luketurner314
@luketurner314 5 ай бұрын
Aviation has the Swiss cheese model ClowdStrike had the NO cheese model
@starnumber12046
@starnumber12046 5 ай бұрын
As an avgeek, this comment is approved
@stevengill1736
@stevengill1736 5 ай бұрын
My sentiments as well...
@halbeik
@halbeik 5 ай бұрын
A smoke test would've caught this at any point in the pipeline
@MeriaDuck
@MeriaDuck 5 ай бұрын
Should have indeed.
@cabanford
@cabanford 5 ай бұрын
So this update bricked every machine it was pushed to... Screw the Rolling Updates - didn't they test this on even a "single" internal machine before their Friday push??? WTF. Managers heads (CEO)
@shimirel
@shimirel 5 ай бұрын
Given their list had "local developer testing" we both know the answer ;-)
@coopercummings8370
@coopercummings8370 5 ай бұрын
It wasn't just a Friday push, it was a 10pm Friday push.
@cabanford
@cabanford 5 ай бұрын
@@coopercummings8370 Sadly, it was 24 hours of Friday around the world (not just local dev time 🤣🤦)
@coopercummings8370
@coopercummings8370 5 ай бұрын
@@cabanford local Dev time absolutely matters in this case though. If they pushed their completely untested changes Friday morning they still would have had people in the office to notice they fucked up and roll it back before it blew up as badly as it did, but when they push it at 10pm that was probably the last guy in the office and he left immediately after pushing the patch.
@OhhCrapGuy
@OhhCrapGuy 5 ай бұрын
The ABCs of running a company like Crowdstrike: Airline: We forgot to check if we put any fuel in the plane, or if the engines worked, or if they were attached, but at least there was a pilot in the plane before takeoff. And it's a good thing that Southwest pilot was on our flight. Bar tending: We have an excellent bar, high quality glassware, exquisite lighting, a phenomenal sound system, and only the best point-of-sale system available. Let's open the doors, it's time for the grand opening. ... wait, we have to buy the booze? I thought whenever a bartender came in, they'd bring the booze with them! What do you mean "hire a bartender"? Car manufacturing: Look, we made sure that all the wheels were firmly attached to the vehicle. Sure, one was bolted to the hood, another was attached to the middle of axle, which caused the axle to snap in half, and the last one was put in place of the steering wheel, but we made DAMN sure that it had all 3 wheels attached this time. ...what do you mean "4 wheels"?
@conceptrat
@conceptrat 5 ай бұрын
A little bird told me that the CI/CD process was taking too long and the maxing out the processors. So they removed some parts of the process (tests?) and then pushed start again. And here we are. Partially.
@OhhCrapGuy
@OhhCrapGuy 5 ай бұрын
The tests are maxing out all the processors? That sounds like a *failing test* to me. bEtTeR TuRN oFf ThE uNIt TeSTs
@EwanMarshall
@EwanMarshall 5 ай бұрын
Ooh, tests fail, lets remove the tests... yeah, great idea... The thing is, the amount of smaller similar incidents over the last year raises questions about their testing practices before their statements. I think gross negligence can easily be made out here.
@Elesario
@Elesario 5 ай бұрын
Sounds like another company that isn't interested in investing development time into the test and relevant toolchains, because they want to focus on their core product.
@itsTyrion
@itsTyrion 5 ай бұрын
source
@toastrecon
@toastrecon 5 ай бұрын
I don’t think that the PR team is necessarily inept, I’d bet more than a few $10 Uber Eats cards that the statement was wordsmithed for hours by engineers, lawyers, executives and whoever else felt like their neck was on the line. Endless revisions until it said everything and yet nothing.
@TehPwnerer
@TehPwnerer 5 ай бұрын
$10.00 gift cards? A slap in the face would be less insulting
@SforSamPlays
@SforSamPlays 5 ай бұрын
They had also rolled back on the gift cards apparently. Which somehow is worse
@Space_US
@Space_US 5 ай бұрын
@@SforSamPlays how? did uber revoke them?
@SforSamPlays
@SforSamPlays 5 ай бұрын
@@Space_US it was mentioned at the very end (I didn’t get that far when leaving the comment) Either CrowdStrike went and invalidated any not used (probably due to people being rightfully upset) or Uber Eats did it cause they thought it was all fraudulent (since scammers use gift cards to transfer money and stuff)
@BroudbrunMusicMerge
@BroudbrunMusicMerge 5 ай бұрын
​​@@Space_US Yes, sorta. Uber detected a possible fraud due to the large number of cards and deactivated them
@Space_US
@Space_US 5 ай бұрын
@@BroudbrunMusicMerge lmao
@Z3rgatul
@Z3rgatul 5 ай бұрын
WHQL is not about windows update. It's about digital signature from Microsoft. Windows doesn't allow to load non-signed drivers, unless you enabled debug mode which was made specifically for driver development. Driver can be downloaded from anywhere. But only Microsoft signed drivers can be loaded on normal windows machines.
@Elesario
@Elesario 5 ай бұрын
You can install non-WHQL certified drivers without necessarily being in debug mode. That's what a lot of driver update programs, like the ones likely installed with your graphics card, are doing. Theo does mention this during the video. If you try to manually install a non WHQL driver you'll probably get a warning, but you can still instruct Windows to install it. Some drivers do get install by the Windows Update software, and those ones are only ever WHQL certified.
@Z3rgatul
@Z3rgatul 5 ай бұрын
@@Elesario I just went to Device Manager -> Display Adapters -> my RTX GPU -> Properties -> Driver tab Digital Signer - Microsoft bla-bla-bla You can click on Driver Details and see that every single file in the list is signed by Microsoft If you don't believe this information, you can manually check every individual sys/dll/exe file and Microsoft signature is here You can't install non-signed drivers starting from Windows Vista. You have to disable driver signature verification on the system level and reboot machine. There is no warning that can be bypassed.
@wrfsh
@wrfsh 5 ай бұрын
3:54 Hey Theo, you're wrong here about graphics drivers. WHQL is not there just to put you into the next windows update. It allows the driver to be signed, and the windows kernel won't allow you to load unsigned drivers (normally, unless you're in debug mode and don't have secure boot enabled). So even if you distribute your driver through your own channel, you still need to get it signed, and for some drivers that means going through WHQL. CS sidestepped it with the channel files because they don't actually load those files as drivers, as i think you correctly point out.
@leexgx
@leexgx 5 ай бұрын
20:20 rolled out the update on a Friday at 10pm pt (-7 or 8 gmt, 5am ish uk/eu) the best time to brick 9 million systems
@circuitousprime
@circuitousprime 5 ай бұрын
To be fair it wasn't a Friday evening. It was Thursday at ~11pm CDT (Texas time), Friday ~5am BST (UK time)
@markm1514
@markm1514 5 ай бұрын
The silver lining is that admins who have things inappropriately hosted on Windows are reevaluating their options.
@akam9919
@akam9919 5 ай бұрын
I don't agree with how things are being said that "the crowdstrike update is a not a content update". From what I've been told, the signatures are over glorified config files. That being said, this is lower on the stack than I am familiar with. However, as I see it, this like adding a card to a digital CCG, but all your cards are basically config files that are evaluated when played, performing special actions, etc. Also, let's be self aware here, this is windows and a security company. Do you really want to wait a gazillion hours for an emergency update on a weekend on while cyber attack is going on to get your driver approved? As far as I can tell, the problem is not that crowdstrike is making these content updates not drivers, the problem is they didn't check for a damn null. I would also say that while crowdstrike's default behavior of "crashing" when it can't read a file is dumb...but I would also say this is not something that is totally unreasonable to think it should be default behavior. I can easily see someone thinking... "hmm... if someone tampered with some stuff to the point where I can't read this damn file, I probably shouldn't let us boot". I can also see this being part of some nested if statement, and bad logic causing problems down the line and causing this behavior. I can also see someone not telling the person who wrote the code that "hey, if some weird stuff can't be read, skip the driver. we don't want to be a DOS vector". That being said, I don't like how crowdstrike is handling this. At the very least, if you're going to be spitting on people's faces with a gift card that can't cover squat, have the damn balls to not give them anything. If you really want to be kind, not only refund/credit your customers and tell us that you are reviewing your internal processes, but fucking apologize! Own your mistake, and show us you will do better. Also, give all the un-thanked and called on a weekend IT people that were called away from their friends and family because of your mistake, a $50 uber eats gift card so they can get themselves a goddamn big mac.
@alexholker1309
@alexholker1309 5 ай бұрын
The Digital CCG comparison is an apt one since a game like Hearthstone both has run into the same update latency problem (due to needing Google/Apple to sign any updates to their respective mobile clients) and has prompted the company to bypass that process for emergency fixes (implementing urgent nerfs and bans on the "umpire" server without waiting to get the client update approved).
@jbutler8585
@jbutler8585 5 ай бұрын
Yeah this is horrifying to hear about. The fact that files aren't signed means they are a huge open target for malware. Nevermind that it would have prevented this disaster, it means the entire product and its design are so fundamentally flawed you can't be sure it's providing protection. It could be doing the exact opposite of what it's supposed to, holding the door open for attackers to do whatever they want, with no way of knowing that it's gone malicious because the kernel driver is untouched.
@TheJacrespo
@TheJacrespo 5 ай бұрын
I would say the exception here is a related tech company working decently according to minimal tech and engineering standards. The norm these days is a big managerial fat layer pushing the problems towards the bottom, where the technicians are burning out without any understanding from the managerial upper layer over the bolts and knots of the implementations.
@teslainvestah5003
@teslainvestah5003 5 ай бұрын
I haven't heard any stories of clownstrike deaths yet. I heard one about a woman who permanently lost her breast because she urgently needed a mastectomy, but the hospital couldn't perform the more advanced skin-sparing mastectomy they originally planned because it required blood transfusions that were to be ordered on a windows PC.
@JohnLovell-FTW
@JohnLovell-FTW 5 ай бұрын
If they are a pubicly traded company dont they have to pass compliance audits? OWASP? I wonder what thier current score is :). There is no cpming back from this as a company in my opinion. So, what did they actually install? This sounds like vaporware... do the sensors actually do anything?
@gamergirlandco
@gamergirlandco 5 ай бұрын
of course!!! they're very good at taking up disk space :)
@kanbekan
@kanbekan 5 ай бұрын
the sensor also good for farming up money from fortune 500
@autohmae
@autohmae 5 ай бұрын
Let's just say even Uber thought they are a scam.
@RipVanFish09
@RipVanFish09 5 ай бұрын
Please do a video on that conversation with the math teacher.
@applepie9806
@applepie9806 5 ай бұрын
Seconded. He sounds brilliant I would love to hear his opinion on all this.
@jfbeam
@jfbeam 5 ай бұрын
_"It is not code or a kernel driver."_ (fine print) But it _is_ turned into code by our kernel driver, poorly. _"Based on..."_ (an unending one sentence paragraph) Long winded weasel-words for "WE DIDN'T F...ING TEST IT." Something anyone with a single functional brain cell already knew. Had they loaded that "channel update" on a SINGLE machine, they would've seen it crash, and hopefully would not publish it. *This alone is sufficient reason to NEVER do business with Crowdstrike or anyone who's ever driven past their office.* The next paragraph is weasel-word for "we don't check for nulls". A common mistake in userspace, an inexcusable failure in kernel programming.
@GaryGreene1977
@GaryGreene1977 5 ай бұрын
If I were the CTO of a company that uses this product the first thing that would be done is calling my IT staff in and have them remove this tool from all systems in the company. This is pure incompetence
@alenasenie6928
@alenasenie6928 5 ай бұрын
From my perspective it is funny that this happened here first than in kernel anticheats, that can potentially cause the same but for users.
@zisaizic4759
@zisaizic4759 5 ай бұрын
it probably has happened, crowdstrike is just very ubiquitous. Delta airlines wouldn't use an anticheat.
@theswordslay3542
@theswordslay3542 5 ай бұрын
kernel anticheats has already caused some problem, especially the intrusive one like Vanguard. some people has already reported some of their other drivers shutting down, heck, one of them crash their entire PC because of it. the difference is, kernel anticheats are only isolated to gamers community, while crowdstrike? they are spread from airplane to hospital system.
@SomeThingOrMaybeAnother
@SomeThingOrMaybeAnother 5 ай бұрын
It has happened in kernel anticheats before. They just aren't as widespread.
@alexholker1309
@alexholker1309 5 ай бұрын
It would be nice if this debacle made kernel-level anticheat too hot to handle.
@ElmerGLue
@ElmerGLue 5 ай бұрын
@@alexholker1309it won’t. Gamers aren’t corporations and without anti cheats half of online gaming would turn into a wasteland or devs would need to up the costs of server management and that would mostly benefit larger game companies and raise the barrier of entry for small creators.
@mattilindstrom
@mattilindstrom 5 ай бұрын
I wonder if ClownStrike felt the update was so urgent a rolling release was clearly out. My feeling is their process is just deeply defective: the testing failed, the roll-out failed, and their response to all of this has failed. If I had any loose money, I'd be shorting their already battered stock so hard.
@autohmae
@autohmae 5 ай бұрын
The blog post seems to confirm their was no rolling release update process at all.
@starnumber12046
@starnumber12046 5 ай бұрын
Cloudflare once did this, then they dos'd themselves because it contained catastrophic backtrack
@mattilindstrom
@mattilindstrom 5 ай бұрын
@@starnumber12046 I remember that, it's what one gets not understanding how regexes may have horrible scaling built into the way they work. The usual regex libraries are optimized to the hilt with the compilation to a FSM, a pure PEBCAK situation there.
@HarithBK
@HarithBK 5 ай бұрын
what i find shocking is when all these companies purchased crowedstrikes services they didn't bother asking the sales team basic operational questions that wouldn't be secret. the IT people at these companies also needs to reamed by there incompetence to do basic checks over the software they order.
@GackFinder
@GackFinder 5 ай бұрын
Agreed. This responsibiIity IargeIy faIIs on the CIOs and CTOs of the companies that bought into CIownStrike. But... I've been an IT consuItant for some 20 years now, and I can teII you, CIOs and CTOs are in generaI extremeIy incompetent from a technicaI standpoint. They are usuaIIy in the positions they are in because they are good taIkers, or because they are friends with the CEO.
@itskdog
@itskdog 5 ай бұрын
I realised only now why there's no apology - the lawyers told them not to as apologies are taken as admissions of guilt in the courts and will garuantee a defeat, when by not apologising they have a chance, however slim, of minimising the damage.
@zoeherriot
@zoeherriot 5 ай бұрын
The issue is at Ring 0 in the windows kernel, a crash means a BSOD. That's intended behavior. But they clearly didn't validate that the file they loaded had valid data in that driver. Easy mistake to make, easy one to catch before you push it to the world. They had a similar issue with some Linux distro's earlier in the year. The "system performance" checking may be related to that. But, there is a pattern.
@MyBlogsTV
@MyBlogsTV 5 ай бұрын
And this is why you don't push to prod on Friday
@nicejungle
@nicejungle 5 ай бұрын
With Linux, you can reboot with the previous kernel, it's just a selection menu at start On Windows, you're screwed because it's a toy OS
@zoeherriot
@zoeherriot 5 ай бұрын
@@nicejungle um.. you weren't really screwed anymore than you would have been with Linux on Windows. You would still need to physically go to the machine and change the configuration / or in the case of windows delete a file.
@nicejungle
@nicejungle 5 ай бұрын
@@zoeherriot One major difference : * on linux this can be done in one reboot without any technical knowledge * on Windows, you have to use magic to know the faulty file to delete
@tonytsai6600
@tonytsai6600 5 ай бұрын
⁠@@nicejungleYes, Windows is suck, but if this incident happened to Linux with this extent, it still would be a disaster, IT still needs to fix these servers physically or remotely. And the whole world has already been impacted. Why Windows suck is cause the bit locker to hinder you fix the issue quickly, but bit locker is also a good tool to protect your data. So what I understand is there is no perfect os if any unexpected thing happened, and that’s the reason why we need to buy insurance.
@maciejtrybilo
@maciejtrybilo 5 ай бұрын
Love how they surpassed the "works on my machine" memes by claiming that local developer testing is only going to be introduced now.
@GreyDeathVaccine
@GreyDeathVaccine 5 ай бұрын
They don't when to shut the fuck up for their own sake.
@lornova79
@lornova79 5 ай бұрын
Crowdstrike developers use only macOS so local testing the Windows sensor would be complicated...
@melimsah
@melimsah 5 ай бұрын
Seeing you progressively lose your mind throughout the video is a work of art.
@Hexanitrobenzene
@Hexanitrobenzene 5 ай бұрын
Yeah, this video is nerdy stand up comedy show :)
@wlockuz4467
@wlockuz4467 5 ай бұрын
The incident was an accident, but the gift cards were planned, approved and handed out with an intent. This just goes to show that they care more about saving face than help their customers. Their service costs upwards of $180 per device per month. If you make a very conservative example of a company where 100 devices need Crowdstrike, that's $18k a month. The $10 gift card means absolutely nothing in comparison, if anything it's an insult, a f*** you. If they really want to show that they care, they should waive their service charges for a few months for the affected customers. Anything less would be an insult. Last but not least, this is just wishful thinking, but this should be treated as a cyberattack with Crowdstrike brought to court and held accountable.
@CC21200
@CC21200 5 ай бұрын
I've been saying that I have personally permanently lost more time and data due to automatic updates than from all other malware combined, and that's considering that I often go to dodgy websites. Are you sure the $10 voucher thing is real? Because it sounds fake... as in I think we're beyond incompetence here, and that you'd have to be a true sociopath to make that offer, or maybe someone who's deliberately trying to tank the company. Can't believe I'm saying that but I'm starting to seriously think the response is more damaging than the crash itself.
@daveh0
@daveh0 5 ай бұрын
It's absurd to say auto software updates is the cause. This is caused by local admin/root on thousands of machines. Allowing them to make changes everywhere at that OS level is the problem.
@shishsquared
@shishsquared 5 ай бұрын
The best part is its actually not a "binary file" if it's all zeroes.
@nibblrrr7124
@nibblrrr7124 5 ай бұрын
you don't even have to open the unary file, just stat it to get the length and you have all the information encoded in it. GENIUS!
@the-answer-is-42
@the-answer-is-42 5 ай бұрын
​@@nibblrrr7124It's the best optimization! No parsing whatever is in the file is needed! Just make sure some of your zeros are more zero than other zeros and I promise, it will work perfectly!
@tma2001
@tma2001 5 ай бұрын
That turned out to be a red herring and nothing to do with the crash as CrowdStrike clarified in an inital post. Not everyone had an all zeros channel file - valid files have a magic byte signature of 0xaaaaaaa at the beginning. Disassembly of the actual kernel driver shows this is checked for. Neither does a channel file contain code. Experienced hackers have speculated that channel files are initially preallocated and only updated after the temporary downloaded file is parsed correctly but of course the kernel driver crashed and left it in this state. This makes sense of the all the facts. The actual null ptr dereference was due to a faulty field in the channel file that led to a memory allocation error from the non-paged memory pool, either allocation was the incorrect size or misaligned. The BSOD exception is a non paged access fault. The rest is history with lack of staged rollouts of this hotfix.
5 ай бұрын
@@tma2001 That makes so much sense! Does anyone have a copy of the original file? It would be great to see that instead of information based on misinformation from a Twitter post (the one about the file being all zeros that went viral).
@tma2001
@tma2001 5 ай бұрын
CrowdStrike published a Tech Analysis: 'Channel File May Contain Null Bytes' explanation at the time of the outage which I and everyone else missed which finally clears up that mystery! My intuition was correct but it is Windows itself rather than the CS driver that first erases data of disk sectors for a newly allocated file for security reasons. Writes don't occur until file is flushed by Cache Manager which doesn't happen if the driver crashes with a BSOD.
@emaayan
@emaayan 5 ай бұрын
one of the comments on dave's video said that a customer actually DID configure staggered rollouts on their servers, but CS actually ignored it.
@CapsAdmin
@CapsAdmin 5 ай бұрын
Did they ship zeros or did their "content update" become zeros right before testing and shipping? Obviously the receiving end should also have some sort of validation other than "graceful" runtime exception handling.
@tma2001
@tma2001 5 ай бұрын
the all zeros channel file is a complete red herring and Theo keeps propagating it and coming to the wrong conclusion - the initial reports about this were rightly causing a lot of confusion. However not all customers had an all zeros channel file - CrowdStrike clarified this was not issue and valid files have a magic byte signature at the start which is checked for as disassembly of the kernel driver has confirmed.
@itsTyrion
@itsTyrion 5 ай бұрын
@@tma2001 let's remove that part then, most points still stand as-is or slightly altered.
@tma2001
@tma2001 5 ай бұрын
@@itsTyrion agreed - CrowdStrike actually tell us in this press release what the issue was and its a shocking admission. One word in that press release gives the game away - _trust_ . The 2 previous named pipe template instances had worked in production as expected so was now trusted - they _assumed_ another slight variant of the same kind wouldn't cause any problems. And as a rapid response update it is not subject to policy staging controls the rest is history. Basically complacency led to their downfall.
@junit1606
@junit1606 5 ай бұрын
Arent you the one who was saying that software testing is not needed, and its better to let it fail on users land?
@GreyDeathVaccine
@GreyDeathVaccine 5 ай бұрын
Different use cases. Please don't compare apples to orange.
@junit1606
@junit1606 5 ай бұрын
​@@GreyDeathVaccine how does testing difers from testing? And how does letting application fail for the user is not what he is doing?
@kamertonaudiophileplayer847
@kamertonaudiophileplayer847 5 ай бұрын
FYI: Austin, TX is 2 hours ahead of California.
@Flameboar
@Flameboar 5 ай бұрын
The time was what used to be called GMT, not CDT.
@kamertonaudiophileplayer847
@kamertonaudiophileplayer847 5 ай бұрын
@@Flameboar It makes a lot sense.
@NormanLyon
@NormanLyon 5 ай бұрын
I agree with the fact the CS messed up bad. But I have disagreements with your analysis. 1. What I understood from their RCA was that their validation test suite failed in such a manner as to not generate logs from the most recent update, but to instead use logs from previous updates. Yes that's horrible, but it's a bug, not a lack of testing. 2. The fact that validation was only happening on send was a problem. Validation of received updates must occur for anything to be considered resilient. The driver should have refused the update, and stayed on the previous version (sending an alert of failed update). Failing that, it needs to version itself for some level of rollback to previous known good (and send alert of failed update). 3. I can understand the lack of booting on valid update. Many in the security world would call an assumption of being protected, having protection broken as a worse state than knowingly having no protection. When dealing with corporate and government regulations, these concerns are real. 4. I agree that automatic update to "latest and most likely to crash" is a horrible stance. It's also the stance taken by many in the IT industry today as a means to combat never updating. Enterprise software should have a middle ground if it's ever going to be taken seriously as an enterprise solution. Canary testing and staged promotion lifecycles need to be taken seriously. 5. WHQL is also at fault here. Security vendors require moving at a quick pace, and as MS has a near-monopoly on OS for end-user systems, regulatory concerns require that access MS security teams can take advantage of must be usable by non-MS entities. WHQL needs to be able to provide meaningful and quick value in cases such as security products. The fact that this driver has these faults yet is certified is proof that CS deserves their certification pulled, and must re-certify under a worthwhile process. 6. Common corporate implementations of bitlocker make this event a nightmare. Too many companies believe that they empower their employees to manage their own devices. Yet very few users were able to access the bitlocker passcodes for their systems. Each company that fell victim to this situation failed miserably at their disaster recovery tabletops and need to fix this ASAP. I wouldn't want to do business with CS now. Yet any enterprise-wide product of this nature is difficult to change. CS's competitors have their own problems. The general mindset I've seen from every vendor in the security focused enterprise software market is that they are filled with unjustified bravado. This incident won't change that. I just hope it will make some things better, especially as the real issue doesn't completely belong to CS.
@RichardHennigan
@RichardHennigan 5 ай бұрын
Even a non tech person would come up with the idea of rolling releases to make sure things were working. There MUST have been multiple people in the company pointing out this dangerous flaw.
@wlockuz4467
@wlockuz4467 5 ай бұрын
It doesn't even have to be something as complex as rolling releases. Something this severe would be easily caught if they actually tried to run their production build.
@marknn3
@marknn3 5 ай бұрын
I wonder if CrowdStrike is using the new Microsoft Dev Drives file system to create the content update files. I had experienced (are reported to MS) files getting all zero content. It was related to Dev Drives + virus scanner (f-secure derivative). After uninstalling it and switching to Defender, no problems since.
@dondekeeper2943
@dondekeeper2943 5 ай бұрын
Crowdstrike is now officially a ransomware service provider 🤣
@frankroos1167
@frankroos1167 5 ай бұрын
Erm....as a developer I am used to one thing: The least thing we have done before releasing an update package is test the installation of the package, as it will be released in a safe environment. Obviously, that test is missing. Which is mind-boggling to me. In the almost 30 years of my carreer, I have not worked for any company that didn't have that test. And the companies I worked for are not all big companies. Some as small as a company built around a team of 4 developers. It is so basic. And so everywhere I have been. I can't even imagine a company that doesn't have it. And now there is such a company that is this powerful? I am scared. Really scared. As far as I am concerned, CrowdStrike should go bust, just for having that test missing. I don't want them to improve. I just want them gone. If yo make a mistake like this, you are bound to make more. Question for MS: Why a certification for a driver that has these shortcomings? Null-checks and verifying unsigned files getting loaded. In most code it shouldn't be too hard to see these things are in need and missing.
@debasishraychawdhuri
@debasishraychawdhuri 5 ай бұрын
Imagine Tesla pushes an update that makes a sudden left turn at random times.
@kingofichigo
@kingofichigo 5 ай бұрын
I wouldn't be surprised tbh
@EscapeMCP
@EscapeMCP 5 ай бұрын
@@kingofichigo Yeah, seems immeasurably possible
@ytlongbeach
@ytlongbeach 5 ай бұрын
Theo, thanks for putting this out there. 99% of people would never understand what you laid out, without such an incredible run down on the CrowdStrike failures. This video should be required viewing for all large company executives and the entire US congress. kudos !!
@tma2001
@tma2001 5 ай бұрын
except he was wrong on the all zeros channel file - that turned out to be a red herring but it was initially very confusing because not everyone had an all zeros file. CrowdStrike made this clear in their initial blog post (not the more detailed one in this video).
@nu11pointerexception
@nu11pointerexception 5 ай бұрын
CrowdStrike Blue Falcon is more fitting name after bricking millions of computers
@drankthetranquil
@drankthetranquil 5 ай бұрын
This was a great video and I really appreciated the deep dive and discussion. This is almost just... unbelievable haha
@stevengill1736
@stevengill1736 5 ай бұрын
My question would be, can Crowdstrike even survive this incident? If I was an IT manager, other options would be of interest right now...
@piccalillipit9211
@piccalillipit9211 5 ай бұрын
*SIMPLE FACT OF LIFE - NO ONE IS GOOD AT THEIR JOB* We like to think we are, we hope others are, but in reality, we are at best less incompetent than most others and we call that "GOOD" - we didn't evolve to do constant work like a machine and we are really BAD at it. The guy hand building a car in a small team start to finish HE is truly GOOD at his job, the guy coding a product he has no interest in for a giant company - NOT GOOD.
@JohnDoe4321
@JohnDoe4321 5 ай бұрын
02:31 - WHQL is NOT a "hard certification to get". For a "software driver" like anti-malware, all that really gets tested is that the driver correctly implements the PNP and power management state machines. That isn't easy to get 100% right if you try to write the code from scratch, which is why nobody does that. Everybody either does a copy/paste from one of the Microsoft sample drivers, or uses Microsoft's KMDF framework, which handles it for you. WHQL requires more in-depth testing for known device classes, such as storage, network, and video. These drivers have to pass class-specific tests, which more fully exercise the driver functionality.
@animanaut
@animanaut 5 ай бұрын
lol, so they introduce 'works on my machine' just now? where tf did they start to begin with?
@MiklosGalicz
@MiklosGalicz 5 ай бұрын
Aren't external files in this case the same as virus definitions for defender and all of the other virus scanners out there?
@autohmae
@autohmae 5 ай бұрын
Yes, similar, so similar in fact McAfee had a similar issue in 2010, when the clownstrike CEO was the CTO at McAfee (!)
@hungrymusicwolf
@hungrymusicwolf 5 ай бұрын
15:46 "gracefully handle" - As gracefully as a nuke and twice as !quiet.
@alexedelweiss3267
@alexedelweiss3267 5 ай бұрын
I just wonder what were they doing with all the money they receive, because it's a really expensive product... So expensive that it's unaffordable for most companies.
@MichaelSchuerig
@MichaelSchuerig 5 ай бұрын
I don't have any special insights into the situation, but my hunch is that the fault occurred in a place where nobody thought it could happen. Let's assume the "content update" is checked and perfectly valid. This update needs to be provided to clients by some service. That service could simply be buggy and misbehave for various reasons that results in corrupt data being transferred. Of course, this risk could be mitigated by validation on the clients and rolling releases.
@wulf2121
@wulf2121 5 ай бұрын
Theres one small misunderstanding in the beginning on your side: WHQL certification is not (just) regarding using windows update to push drivers. All the newest drivers that can be downloaded from Nvidias website are WHQL certified before Nvidia makes them available as well. Basically any device driver running in the kernel needs to be certified, or Windows will complain upon Installation (In Home Version nowadays it's completely blocked to install such driver and only with a Pro License you even have the option to override this. At times of Windows XP there still was the option to ignore the warning from Windows and install the driver anyway).
@JohnDoe4321
@JohnDoe4321 5 ай бұрын
06:35 - This is wrong. Drivers for actual hardware execute different code paths based on what they receive from the device. This isn't that different than Crowdstrike's driver executing different code based on a definition file. Any driver that doesn't fully validate all data can crash. Real world example: There was a network card (NIC) which had its own firmware. The vendor updated the firmware, and soon after that, reports started to come in about Windows driver crashes. Crashes were infrequent and intermittent -- it took a while to root cause. It turned out that the new firmware would sometimes send data to the driver that the driver didn't interpret properly. This wasn't a failure of the WHQL process. It was a bug, and Bugs Happen. WHQL is intended to reduce the number of driver bugs -- it never promised to eliminate 100% of them.
@Pekz00r
@Pekz00r 5 ай бұрын
Several of the steps you suggest would delay the delivery of the update significantly and in this case every minutes delay can be very critical. Even every second can be important when you racing against a virus that is spreading rapidly. You need to reach the machines before the virus. But yeah, you are absolutely right that there are many things they could have done that would not delay the rollout much, but still prevent a bad rollout reliably. It is as you said insane that they are rolling out boot driver updates to millions of machines without more verifications.
@ForLineage-dr5ju
@ForLineage-dr5ju 5 ай бұрын
The channel file is probably not being signed. That's why it tried to run a bunch of zeros. Unless somehow a bunch of zeroes got signed, the change was never caught with a diff...and no further testing was done. Just pushed out to everything, everywhere all at once. They could have tried it on a work laptop in the office. I bet they don't have the ability to even push to one targetted test machine. I bet their options are "Push to Airlines or push to hospitals for testing".
@CodingAbroad
@CodingAbroad 5 ай бұрын
Rumour has it Microsoft is already looking to sacking off crowdstrike
@MacGuffin1
@MacGuffin1 5 ай бұрын
This is very surface level look. You should see Enderman's recent video on Dave. This attack vector was well known and it was almost certainly an attack that was about money, just like the denial of it being an attack was too. I'm sick of telling everyone about the WSB post where the same guy made millions on this AND snowflake by shorting them hours before they were hacked. How can you red-team something that doesn't boot? not to mention hashing, manifests etc. Have a look at the cobalt module it was actually patching for, it would have required a lot of testing. As if they shipped this willingly on a friday night... Anyhoo they are still the best choice in this area, all these guys have issues
@Henoik
@Henoik 5 ай бұрын
As a cyber security professional, what freaks me out about this is that if a threat actor were to reverse engineer the driver, they could basically inject instructions to the driver that turns the Falcon agent into a C2 client. Also, just about every EDR solution and antimalware solution does this. The "You need to restart your PC" screen you get after installing an antimalware solution? Yup, that's the antimalware solution letting you know WIndows needs to start all the new drivers you just installed.
@2rx_bni
@2rx_bni 5 ай бұрын
Glad to catch the beginning of this since I showed up to stream late and only caught part of it.
@tma2001
@tma2001 5 ай бұрын
Not everyone had an all zeros channel file which leads to more confusion. CrowdStrike if we believe them, made it clear that null bytes in this or any channel file were not at issue. This is lent weight by the fact that valid channel files have a magic byte signature which disassembly confirms is checked for. On a hacker forum some speculated the file was pre-allocated but not updated from the original download after the driver crashed. So all zero channel files are probably a red herring.
@gregutz4284
@gregutz4284 5 ай бұрын
I don't believe a lick that CS is pushing.
@silvioschurig749
@silvioschurig749 5 ай бұрын
You kind of ignore that this is running in Kernel Space. Trying to gracefully recover from fatal kernel error is riskier than immediate shut downs because it is not just some application process that failed. Basically all you know is at that point: your kernel is corrupt. Also huge issues not just with this company: the terms of service. Crowdstrike basically puts a disclaimer in there that their software / services guarantee nothing, are not suited for production environments and they can't be held liable for anything they do or cause, directly or indirectly. This next statement is victim blaming and terms of service like that this seem😂 to standard for these 'security' companies ... But really: why does anyone sign license / service agreements with a company who cannot guarantee any level of function for their products? I mean if you hire some company like Securitas to handle the cash collection from your points of sale, would you sign terms of service stateing "we may ir may not come by and pick up your cash. If we do come by we may loose some or all of it prior to depositing it to your bank. You are happy to assume that risk."? But for software on your production that's ok?
@grokitall
@grokitall 5 ай бұрын
you cannot recover from the initial crash, you can recover from the boot loop which kept all the machines down. as to the disclaimer, leonard french covered it on youtube, and it is invalid in the case of negligence, which this clearly is.
@DarkerStarSword
@DarkerStarSword 5 ай бұрын
The file was NOT full of zeroes - I've verified that was not the case on two affected machines. I suspect the people who saw a file full of zeroes experienced a crash before the data was flushed from buffers in RAM to disk.
@shotgeek
@shotgeek 5 ай бұрын
I was admitted to a local hospital the day before this. On Friday almost every hospital system (imaging, laboratory, even food service) were down.
@JonathanSwiftUK
@JonathanSwiftUK 5 ай бұрын
The kernel mode driver did not crash the system, as such, it only faulted when the channel file with zeros was executed / attempted to be used. We fixed most by deleting the file remotely, no safe mode, no moving the drive to another machine and deleting the file, then moving the drive back, simple and quick, and when we deleted that file the system continued to boot. Job done. Just removing that fixed it. They probably have their own definitions language or pseudo-code, like WASM.
@kamilkardel2792
@kamilkardel2792 5 ай бұрын
Giving control of deployment to the customer is actually a smart move because, as opposed to the vendor, the customer knows the exact roles of specific machines. With that control, you can make sure that a bad update doesn't take down all servers running a specific application or all user workstations in some department.
@landmanland
@landmanland 5 ай бұрын
With gracefully they mean “we let the driver boot without the file”.😊
@gabotron94
@gabotron94 5 ай бұрын
File not found. Improvise? (y/n)
@mchisolm0
@mchisolm0 5 ай бұрын
I did not realize the assumptions I made when learning about this...thanks for taking the time. The reality of getting out of the signing of code by moving the code to another file...crazy. Could Windows have caught that in the signature step? They could they have been like "Hey, it looks like you are running code in this channel file over there...but you didn't tell us what the content of the channel file was. You think we can sign off on that?"
@acerreteq703
@acerreteq703 5 ай бұрын
Thank you for this video analyzing the matter and for letting you feel so much pain in the process of producing it. But don´t overdo it, we need you. 🖖
@Wampa842
@Wampa842 5 ай бұрын
Crowdstrike (noun): a large-scale outage or damage of infrastructure caused by the combination of an external vendor's action and industry-wide reliance on said vendor. Example: "The latest crowdstrike has caused a massive disruption of emergency services." Crowdstrike (verb, simple past and past participle "crowdstruck"): to cause such an outage to happen. Example: "Our boss moved our email service off-site and now we can't access it because of the outage, I knew he'd be the one to crowdstrike us!"
@dronicx7974
@dronicx7974 5 ай бұрын
I hope the literally unusable (because Uber Eats cancelled them thinking they were all fraud gift cards) $10 gift card is able to offset some of Delta's costs with their literal 2 years worth of flight cancelations withing a week
@hsvandrew
@hsvandrew 5 ай бұрын
What makes me so mad about these failures (as a software engineer) is that unlike in other industries (i.e. construction), there will be no punishment for this negligence and misconduct. If the CTO was a construction engineer, jail time would be associated with this deliberate, incompetence. Yet in IT, negligence is just allowed despite the financial impact or lives affected. It is about time this changed so that good people doing the right thing are rewarded, and those that aren't get punished. Crowdstrikes share price before this incident was so high, they effectively had 'unlimited money'. The fact they were able to operate like a part time developer is just insane. This is also a test for corporate America and the CTO's who signed up for this product without any due diligence. Will they actually remove this product & company from their systems or by lack of action promote this type of behaviour as acceptable.
@TheMrbrookster
@TheMrbrookster 5 ай бұрын
Clearly they don't bother running the software full stop, imagine if these clowns were running CloudStrike on their own servers, all their systems would have been down for a week before they could roll back the change.
@phillipsusi1791
@phillipsusi1791 5 ай бұрын
A rolling release means you get the latest stuff right away all the time That is exactly what they did. As opposed to a traditional release, which is tested thoroughly in development, allowed to settle with more testing, used by early adopters for a while, and THEN recommended for everyone else to use, but they all still have to choose to install the new release, when they feel like it.
@Flameboar
@Flameboar 5 ай бұрын
There is a interview on KZbin of the Clownstrike CEO after the BSOD disaster. He was blinking at a very high rate and coughed as he tried to answer the interviewers' questions. This gave me the impression that he was fully aware of how badly CS had f'ed up.
@television9233
@television9233 5 ай бұрын
This is unfathomable levels of disastrous failure in all levels of their pipeline. I would legitimately have more faith in them if they said that they were in fact hacked. I would never understand any competent engineer still using their services.
@Canleaf08
@Canleaf08 5 ай бұрын
It shows how brittle the computer infrastructure is. Been a dev since 2020. The Log4Shell / Log4J vunerability from 2022 feels very small.
@TheStevenWhiting
@TheStevenWhiting 4 ай бұрын
24:51 Maybe but not if the drive is encrypted with Bitlocker, then you wouldn't be able to read the drive to do that.
@ADHJkvsNgsMBbTQe
@ADHJkvsNgsMBbTQe 5 ай бұрын
Once again, people seem surprised that the emperor isn’t actually wearing any clothes.
@tablettablete186
@tablettablete186 5 ай бұрын
24:05 That actually makes sense, otherwise the software wouldn't be able to monitor things during boot. Like this: -> OS Starts -> Dtiver loads (but does not parse the rules) -> Malware runs invisible -> Driver parses the rules and starts monitoring the machine Its us pretty common to run at boot and start monitoring as early as possible.
@jeremysollars5922
@jeremysollars5922 5 ай бұрын
Any sufficiently advanced incompetence, is indistinguishable from malice. I've been getting a lot of mileage from this quote recently xD
@ambhaiji
@ambhaiji 5 ай бұрын
Theo is Gus from Psych blasting at CrowdStrike(Shawn) for all the alternative choices he has in handling how to deal with the escaped prisoners on the boat.
@3ventic
@3ventic 5 ай бұрын
4:02 by the time nvidia's drivers are out of beta, they're WHQL signed. Getting 3rd party driver updates via Windows Update is relatively new (2016) and nvidia seems to have preferred to bundle it with their own app along with all their other functionality (which first released 2013).
@matthewstott3493
@matthewstott3493 5 ай бұрын
CrowdStrike has claimed that the content update was not all hex zeros. It was instructions and it had a bug, that crashed the kernel and due to Windows handling of NTFS files when that happens the file was replaced with zeros. It still managed to keep loading and crashing Windows at every boot.
@Wlerin7
@Wlerin7 5 ай бұрын
So... what I'm getting from this is that CrowdStrike is basically crowdsourced security?
@Skirakzalus
@Skirakzalus 5 ай бұрын
When first hearing about CrowdStrike it took me way too long to figure out that this was not the name of a deliberate cyber attack but a supposedly legitimate company.
@NankitaBR
@NankitaBR 5 ай бұрын
That's why I *always* tick off the automatic updates in everything where I have the option to do it. I'm not a tech-savvy person, but even I know that if there is a broken update on something I use if I don't have automatic updates I can figure out I shouldn't update at that moment and wait for them to fix the issue.
@jordanjackson6151
@jordanjackson6151 5 ай бұрын
The ending said it all. Yikes!
@epicmap
@epicmap 5 ай бұрын
17:42 can you imagine a validation of a file which would pass when the file is all zeros? It's hard for me to come up with at least one such validation.
@DaxSudo
@DaxSudo 5 ай бұрын
It’s like the rust neon crate. It’s not part of JavaScript or your react ecosystem or your node stuff but it is a node binary that exports functions. Oh, we’re just going to ship a bad binary that crashes everything. But even that’s a misnomer because the binary is configured to run in the node run time rather than be an obfuscated configuration file.
This VS Code theme is threatening people?
14:26
Theo - t3․gg
Рет қаралды 90 М.
PirateSoftware is right, this needs to stop
15:14
Theo - t3․gg
Рет қаралды 718 М.
Don’t Choose The Wrong Box 😱
00:41
Topper Guild
Рет қаралды 62 МЛН
Sigma Kid Mistake #funny #sigma
00:17
CRAZY GREAPA
Рет қаралды 30 МЛН
IL'HAN - Qalqam | Official Music Video
03:17
Ilhan Ihsanov
Рет қаралды 700 М.
Hello, old friend… - Media Ripping Explained
13:34
Linus Tech Tips
Рет қаралды 1,7 МЛН
My FBI Declassified Story
9:26
Marques Brownlee
Рет қаралды 7 МЛН
Everything Wrong with AI
36:17
gabi belle
Рет қаралды 1,3 МЛН
The Unfixable ARM Memory Bug
28:18
ThePrimeTime
Рет қаралды 145 М.
Getting a Dev Job in 2025
21:42
Theo - t3․gg
Рет қаралды 81 М.
The Secret Language Scaling WhatsApp and Discord
28:32
Theo - t3․gg
Рет қаралды 180 М.
My New Favorite Terminal Just Dropped
24:42
Theo - t3․gg
Рет қаралды 91 М.
Passkeys: The Future Of Authentication
31:22
Theo - t3․gg
Рет қаралды 88 М.
What's going on with Windows Laptops?
10:30
Marques Brownlee
Рет қаралды 4,2 МЛН
Don’t Choose The Wrong Box 😱
00:41
Topper Guild
Рет қаралды 62 МЛН