Great video about a guy giving a presentation about the inner workings of an x86 execution pipeline. Wonderful gestures, I've seen few people point at slides like this presenter. And the way he sometimes moves his head up and down... Marvelous! The topic of the presentation seemed interesting too, but it is difficult to tell without actually seeing the slides :(
@ankushm3t3 ай бұрын
Slides in description ^
@edgeeffect7 күн бұрын
@@ankushm3t Yeah... but "Slides in video ^" are so much easier!
@test1430008 жыл бұрын
a very bad video composition, the speaker discuss something on a screen, instead the screen with a code we are continue watching the speaker for a prolonged time then a short glimpse of the screen
@me02625 жыл бұрын
Agreed. It's a good talk, but the editing is horrible. The presenter goes over multiple slides that are crucial to the talk, but these slides are not shown in the video. focus more on the slides than the presenter.
@LemonChieff5 жыл бұрын
Pretty sure it's automated… I mean it has to be. Nobody is that bad at editing, right? RIGHT?
@mekafinchi2 жыл бұрын
The more I see about things involving x86 and RISC stuff the more interesting it gets, x86 is really hard to decode but the equivalent RISC encoding would be 4-6 times longer, and then x86 can turn around and convert the tiny CISC into VLIW level internal representations. tldr it's just really cool
@Carutsu9 жыл бұрын
Great talk indeed. If I may suggest, please add the slides as a picture in picture view rather than switch to it. A lot of the time the speaker is making a reference that is lost because there's no way to see the slides.
@LemonChieff5 жыл бұрын
It's simple, really: first read the 5000 page manual then read the "RELATED LITERATURE" and then you're halfway there. Then just read the amd manuals and realize it actually makes sense now but you're also 10 years older.
@evilone13517 жыл бұрын
Excellent talk! Conveying CPU internals in such a clear way is a talent.
@angelsbmartin34319 жыл бұрын
That was really a great talk, highly recommended. Thanks, Matt.
@PingPong-em5pg8 жыл бұрын
The person responsible for this "montage" should change profession and stay away from any video production related jobs. Where are the slides? Did the guy even watch it? *Sigh*
@KonradTheWizzard2 жыл бұрын
One little correction at the end: while it is true that electrons don't move anywhere near the speed of light (they "drift" centimeters per second or less), the electric signal does move at the speed of light. Specifically the speed of light in the wire (of course for a very lose definition of light: wires are "transparent" to signals, which are electro-magnetic, like light) - this is about 75-90% of c (speed of light in vacuum), depending on the specific material. One note for the incurably curious: when he sais "very few electrons" in a RAM cell - for modern RAM or Flash cells that is in the region of a few hundred to one thousand electrons - it used to be a few tens of thousands in the 90'ies. That is femto Farads in storage capacity. In comparison for a LED to briefly light up millions to billions of electrons have to move.
@johnmccrary29599 жыл бұрын
Skip to 10:56 if you already know Intel assembly.
@eXcalibooor7 жыл бұрын
The slides aren't visible half of the time :/
@frutiboy19 жыл бұрын
Great talk. Needs to be watched several times to grasp the vast information in there. I made it all the first time:)
@dascandy8 жыл бұрын
wrt instruction decoding; you start with in parallel deciding that if location X is an instruction, how long it would be, for N bytes at a time. The cycle after you do this for N bytes further (and so on). In the second cycle, you sequentially add up the lengths to get to the limit of valid instructions and ship those instructions off to the actual decoder, with still implied links between them wrt dependencies. The step after then outputs a bitfield for the actual register changes it does so that if there's an actual dependency on a previous instruction it can be delayed. I'm more impressed that they made this kind of stuff work with self-modifying code - which is like throwing a giant wrench into this entire machine & making it do one instruction per full pipeline flush.
@JohnSmith-he5xg8 жыл бұрын
Great discussion of the tradeoffs of RAM vs cache @ ~48 mins
@andrewmartin367110 жыл бұрын
Nice talk (though I've not seen all of it yet), branch prediction first came up for me in this relatively well known Stackoverflow Question: "Why is processing a sorted array faster than an unsorted array?"
@MattGodbolt10 жыл бұрын
Yup - I cover that and indeed demonstrate that even in a language like Python you can see similar effects to that SO question.
@walter0bz8 жыл бұрын
how the hell did that ISA end up being our standard.
@ChristofferViken8 жыл бұрын
+walter0bz Bit by bit
@walter0bz8 жыл бұрын
apple managed to change ISA .. twice. Seemingly paradoxical that they ditched 68000 (the most elegant CISC), for PowerPC (a decent RISC), but simply by virtue of the worlds momentum they ended up having to go to an inelegant CISC (x86)
@ChristofferViken8 жыл бұрын
walter0bz That happens to provide the best performance per watt of power consumption. Only a few RISCs exceed x86 in that respect and the few that does are severely capped.
@ChristofferViken8 жыл бұрын
And before you say it: Yes, it has everything to do with the amount of money they can put into the research for making it more efficient, and it will stay that way for a long time. Because they are the ones with the R&D money to make it stay true.
@eformance6 жыл бұрын
Huh, at 31:04 it sounded like you said "most instruction sets have a decrement and jump if not zero, but x86 doesn't". I'm confused because that's exactly what loop does, it decrements CX and jumps if not zero.
@biggiefrosty5 жыл бұрын
he meant a single instruction that does the compound operation "decrement and jump if not zero". That's two instructions in x86, but they really function as a unit that can and should be scheduled together. So internally, the chip fuses those two operations into a compound operation, even though it's still coded in assembly as two separate instructions.
@aquaz_eu8 жыл бұрын
Link for slides : gotocon.com/dl/goto-chicago-2014/slides/MattGodbolt_X86InternalsForFunAndProfit.pdf
@ankushm3t3 ай бұрын
Thanks!
@mumblic8 жыл бұрын
Could be a great video but like mentioned before the edit is terrible. Stopped watching after 5 minutes Link for the slides don't work !!
you mentioned the fetcher reads 16 bytes... that would make sense to me even though the max instruction size is 15, but if you put a valid 15-byte instruction on the last 15 bytes of an executable page (the next page is PAGE_NOACCESS or non-existing), it executes nominally, ie. without causing a page fault, whereas if your process tried to read the 16 bytes it would trigger a page-fault from the 16th byte being on the invalid page. So maybe it just reads 1 byte at a time and determines if it needs to read more? Or maybe it still tries to read 16 bytes but simply doesn't trigger faults during fetch
@paranoiatable5 жыл бұрын
Much of the content in this talk is difficult to understand simply because the viewer can't see the diagrams/code or schematics that Mr. Godbolt is referencing.
@rocqua7 жыл бұрын
Interesting to see how few steps it actually takes to go from the information in this talk to Spectre. And yet, those steps took 3.5 years. It's obvious in hindsight, which goes to show how big the difference between 'obvious' and 'obvious in hindsight' is.
@AlexanderBollbach8 жыл бұрын
i'm way out of my element here, but couldn't these chips expose some API / control mechanism for informing it about branching patterns in the compiled code that back end compiler people could target. this only comes to mind because the task of looking for patterns in branch behavior seems like something naturally in the domain of software and not hardware.
@DaveWhoa6 жыл бұрын
good compilers will structure code in such a way that the most-likely or most-common tests/branches are done first. There's no real need for CPU-specific knowledge to do that
@jubbernaut10 жыл бұрын
I will be watching this video one minute at a time, before having a lie down, in order to avoid nosebleeds. I'm two minutes in and it seems good so far - although the bloke doing the talk seems a bit shifty
@Ljosi9 жыл бұрын
+robin jubber if you are having such a hard time grasping the concepts in this video that means you are moving ahead too fast. Go back to the basics and move up slowly, that's what i did. Now i understand 90% of everything he explained here.
@jubbernaut9 жыл бұрын
Heh - thanks Ljosi. I used to work with Matt - I was mostly just taking the piss :-) But you give good advice nonetheless!
@MrWogle16 жыл бұрын
Am I crazy, or did this guy almost stumble on Spectre?
@AxelStrem7 жыл бұрын
2^32 bits is not really 4 GB... sry couldn't hold it in :)
@featel19 жыл бұрын
I don't understand CPU architecture that well and my head is getting hit by terms I don't understand...
@greywolf2716 жыл бұрын
You need intel inside...........
@desmondwilson33907 жыл бұрын
Chris Domas - Sandsifter github.com/xoreaxeaxeax/sandsifter s a n d s i f t e r : the x86 processor fuzzer Overview The sandsifter audits x86 processors for hidden instructions and hardware bugs, by systematically generating machine code to search through a processor's instruction set, and monitoring execution for anomalies. Sandsifter has uncovered secret processor instructions from every major vendor; ubiquitous software bugs in disassemblers, assemblers, and emulators; flaws in enterprise hypervisors; and both benign and security-critical hardware bugs in x86 chips. With the multitude of x86 processors in existence, the goal of the tool is to enable users to check their own systems for hidden instructions and bugs.
@bsuperbrain2 жыл бұрын
the audience on youtube would have liked to see the slides not the guy talking
@edgeeffect7 күн бұрын
That's not just "the guy"... that's Matt Godbolt!!! ... but you're right about the slides! :)