Professor of computer science here. Nice work. I loved Casey's exposition. I think Casey is being too conservative in his criticism. The idea that fewer instructions is better is an argument from 1980s RISC proponents -- excusable in 1980 but today we know that's simply not true. If fewer instructions were always better, we would have observed RISC architectures -- the ones that people actually used back in the day -- such as the one used on the PowerPC stay largely static. That never happened, the instruction sets and number of transistors increased on the PPC from generation to generation. The original author also misses the fact that even if we are sacrificing die space to implement instructions you can't just consider the consumption of die space "bad". Implementing something on die can result in a huge performance increase. When x86 ISA was extended to add AES operations (first...second? generation i7's?) The result was a 10x improvement in performance. Given the massive use of AES, who in their right mind would consider that a poor use of die space. Also, while I don't know for certain the etymology of "die" in cpu developent. I suspect it's attributable to the use of the term in machining. Where it can be used for any purpose made tool. i.e., A letter from a type case in a old-fashioned printing press would be called a "die". Some of those were eventually made using photo-lithographic processes.
@litium13379 ай бұрын
Die is just inherited from the manufacturing process of chopping something into many smaller pieces or "dicing", a large piece of silicon get diced, so the smaller pieces of silicon become die, there is no big mystery. Same as dicing an onion.
@excitedbox57059 ай бұрын
I think it is called die, because you dice a wafer into many die, not because of the printing "pattern" die.
@shableep9 ай бұрын
I think the “problem” is more complex these days than raw performance. When it comes to making portable work stations, raw performance is a factor, but it is not the only factor. I think ARM is winning because a balance between performance and performance per watt. The smaller instruction set allows more efficiency gains, and thanks to modern 21st century programming toolchains and compilers, the disadvantages of a smaller instruction set is not nearly as much of a cost.
@hctiBelttiL9 ай бұрын
@@excitedbox5705 Or maybe because when you print the dies you are casting a metaphorical die (old English singular for dice) that determines how many individual units (cores) of that die are error-free at the targeted frequency? I'm probably overthinking it.
@adrian_b429 ай бұрын
Having thousands of rarely used instructions on die, multiplied by the number of cores (dozens today) can be wasteful. There are good uses like your AES example and bad use cases. The rare instructions can be avoided by the compilers, replaced with decent equivalents. Decoding consumes transistors and power, termal throttling is a fact. Where is die coming from? Remember the times when circuits were printed on a board with the traces covered with paint, then the board is treated with iron chloride FeCl3 to remove non-covered copper (etching)? Now it is done with EUV light, but it started with die 50-60 years ago.
@raidensama15119 ай бұрын
@ThePrimeTime this was S-tier material! Please have Casey back.
@saturdaysequalsyouth9 ай бұрын
What are the tiers?
@follantic9 ай бұрын
SABCDEF
@CaseJams9 ай бұрын
True professional
@chickenonaraft5089 ай бұрын
I second this
@nullpointer12848 ай бұрын
This!!
@ketchrahalvard81349 ай бұрын
As a chip designer I would like to point out that when any article like this comes up about dropping x86, what they really mean is dropping the x87 floating point extensions (the one that is a stack architecture and runs in 80 bit precision mode), The is specifically what the new Intel spec is aimed at killing. For those of you interested in why just think about how you would do register renaming when your register numbers are all stack based.
@OpenGL4ever9 ай бұрын
Then I have a question. I've also heard that Intel would like to throw out some things. But if a CPU has, say, 8 cores, would it be possible to just throw those things away at 7 cores and keep them at just one core? Especially the old stuff that was used in the DOS era was never written for multicore CPUs anyway. This old software would therefore only need this one full-fledged core.
@stevesether9 ай бұрын
@@OpenGL4ever I had a similar thought. If you suddenly take away the x87 FP stack, what software is suddenly going to break without being re-compiled? It might make a lot more sense to just de-emphasize these old instructions, and make them work, but not as performant.
@asm_nop8 ай бұрын
@@stevesether I don't know what Intel's proposed solution is, but I imagine they have a way to hook those instructions at execution time and deal with them. Since they're very old instructions, they have the benefit of only occurring in very old code. Sure, you could use a ridiculously complex decoder to convert them, but you could also do something crazy like raise a flag to the operating system and flag the code to be decompiled and rebuilt into equivalent compatible instructions by an OS process, and link it back into the original executable. The first run might be slow, but the second time would be real fast.
@Folsomdsf28 ай бұрын
yah, unfortunately the article author and even the commentators have reasons to not really be.. honest so to speak.
@giornikitop53738 ай бұрын
makes sense, these are very old and i don't believe they ever been used, since the pentium era, if ever. the x86 compatibility goes a long way but i guess it's on the safe side, mmx/sse were being used instead. there are also some other legacy stuff that can be removed safely. as for the renaming, isn't intel already using, in lack of a better term, indexed locations for the registers? maybe you can shed some light here because i really don't understand exactly what they do, if that holds any truth.
@Maxible9 ай бұрын
This video was exceptional! Loved diving into the weeds. Also, kudos to your guest for having that board setup. Super helpful and so awesome!
@admiral_hoshi32989 ай бұрын
TLDR: More complex does not mean slower.
@remboldt039 ай бұрын
Yeah, pretty much. The most complex soloutions had to be found to make stuff faster
@julkiewitz9 ай бұрын
If anything it's TLDR: The article is wrong cause the author doesn't know what they are talking about.
@bits360wastaken9 ай бұрын
@@julkiewitz Did you read the article? It was about how ancient rarely used instructions, and the sheer bulk of instructions take up valuable space and increase complexity. The only times "fast" was mentioned was them saying speed was their only priority.
@henry_tsai9 ай бұрын
@@bits360wastakenBut modern chip designers also know that, and they only allocated the bare minimum amount of die space and power to those obsolete instructions. Deleting those instructions won't yield any visible change.
@LiveErrors9 ай бұрын
I think amds 3d VRAM shows that
@brainforest889 ай бұрын
I an ancient too. Programming professionally since 1988 :D
@joshuatye10279 ай бұрын
Congrats
@nezbrun8729 ай бұрын
My first paid for programming job was in 1979, but I wrote my first program in Algol on paper tape in 1976. Really like this guy because he speaks my language, and calls out the downsides and very real practical impact of today's fashionable sacred cow practices.
@veritypickle84719 ай бұрын
ty for your service
@huso77969 ай бұрын
Oh cool, like honestly. What were you programming? How was the debugging process without fancy IDEs? What was it like to describe your job to other people? Could you elaborate more if you don't mind how different it was compared to today's way of software development?
@cylian919 ай бұрын
as old as turbo C ! (anyone know a good decompiler for dos ?)
@tenisviejos9 ай бұрын
You know a person is really smart when they can break down complex concepts to other people. The pipeline explanation was *chef's kiss*
@XDarkGreyX9 ай бұрын
Chat went on about the transfer between the machines, which I noticed too but... should he have addressed that when it comes to the hardware pipeline?
@proceduralism3769 ай бұрын
@@XDarkGreyX The transfer would basically be instant, it's just a bunch of clocked latches that separate each stage
@ApplesOfEpicness8 ай бұрын
The laundry machine analogy is like the standard go-to for explaining pipelining. The buffet analogy also works.
@BrunodeSouzaLino8 ай бұрын
Learning how to teach is a skill which has nothing to do with the knowledge you're teaching.
@TheVoiceofTheProphetElizer8 ай бұрын
@@BrunodeSouzaLino I feel as if millions of tenured researchers with teaching loads cried out, then were suddenly silenced. Perfect way to sum it up. If only the vast majority of people that taught realized it was so much more than verbally repeating something to a room full of 20 somethings.
@mansquatch22608 ай бұрын
I looked it up on wikipedia. It's called a die, because: " Typically, integrated circuits are produced in large batches on a single wafer of electronic-grade silicon (EGS) or other semiconductor (such as GaAs) through processes such as photolithography. The wafer is cut (diced) into many pieces, each containing one copy of the circuit. Each of these pieces is called a die."
@xponen7 ай бұрын
paraphased from Chatgpt when queried "semiconductor die etymology": The word "die" in this context comes from the Middle English "die," which referred to a small cube or object. In manufacturing, a "die" is a specialized tool used in industries like metal stamping, and semiconductor fabrication to cut material.
@mansquatch22607 ай бұрын
@@xponen Seems chat GPT confused three different uses of the word and morphed them together.
@FriesOfTheDead7 ай бұрын
@@mansquatch2260 You basically said "It's called a die, because: It's called a die", well done, slow clap.
@mansquatch22607 ай бұрын
@@FriesOfTheDead it's called a "die" because it's one of many diced up things.
@EyebrowsMahoney6 ай бұрын
@@FriesOfTheDead You think you're smarter than you actually are.
@timseguine29 ай бұрын
One thing I think the author of the article doesn't seem to get, is that if you follow the arguments they actually make to their logical conclusions, you don't end up with RISC-V or ARM, you'd more likely end up reinventing Itanium.
@mrbigberd8 ай бұрын
You can't go that far because the halting problem gets in the way. By the last iteration, Itanium was a bog-standard architecture that basically ignored the whole VLIW aspect entirely.
@SimonBuchanNz8 ай бұрын
Just encode the processor control lines directly! Pay no attention to how you actually read the instructions...
@mrbigberd8 ай бұрын
@@SimonBuchanNz the ISAs don’t just vary in syntax, but in semantics and in guarantees. These guarantees must be fulfilled.
@acheleg8 ай бұрын
Conroe
@hoeding9 ай бұрын
Washer / Dryer metaphor for pipelining nailed it.
@Dongdot1238 ай бұрын
Damn right we just understood it so easily with that explanation
@ylstorage70858 ай бұрын
ford's assembly line could have served better
@Aberusugi8 ай бұрын
Yeah, I finally have a way to describe the concept to other people. Very helpful.
@markteague88898 ай бұрын
The fast-food drive thru is a pretty good one!
@_somerandomguyontheinternet_8 ай бұрын
Yup! Using that from now on!
@nowaymyname9 ай бұрын
As someone who is currently learning x86 ASM at college right now, I feel like I've learned more by Casey in one hour than I have all semester. Please bring him back, awesome content! Full-time content creator Prime has so far not disappointed.
@OpenGL4ever9 ай бұрын
You might also search for "The Intel 80386, part 1: Introduction Raymond Chen" and read part 1 to n.
@RetroPcCupboard8 ай бұрын
They actually make you study x86 ASM at college these days? Back when I was at university they did cover ASM for a simple microprocessor (I forget which). But not the x86 architecture. That was in the late 1990s. ASM is irrelevant for most software developers these days. Compilers will typically produce more optimised machine code than you can do manually with ASM. Unless you really know what you are doing and there is a specific case that the compiler does badly at (or can't do). ASM is useful to teach you the inner workings of a CPU though.
@OpenGL4ever8 ай бұрын
@@RetroPcCupboard Knowing assembler helps you understand what the compiler produces and how high-level languages work. I think this is very valuable knowledge, it's like Latin for languages.
@RetroPcCupboard8 ай бұрын
@OpenGL4ever Sure. If you are interested in that. I think most developers these days don't really care how the compiler works or even what the inner workings of a CPU are. I actually find it fascinating. Despite the fact I have been a software dev for 25 years I am only now learning x86 assembly. I have an old Pentium MMX PC that I am using for the purpose. I realise that I could do it on a Modern PC. But I feel that a slower PC makes more sense for seeing the impact of ASM vs older compilers.
@Pootie_Tang8 ай бұрын
@@RetroPcCupboard man, someone study computer engineering, how can we not study x86 asm if we study how to develop said processors? =)
@cubbucca9 ай бұрын
just got talked out of buying a Washer Dryer Combo
@XDarkGreyX9 ай бұрын
@@squishy-tomato is that a single or someone in a two-person houshold talking?
@Cadaverine19909 ай бұрын
@@XDarkGreyXtwo person? Have kids...
@cadekachelmeier72519 ай бұрын
They're pretty great since you don't have to bother moving the clothes half way through. So you can throw a load in before bed or whenever. The main thing is that the drum for a dryer is about twice as big as a washer for a given capacity. So you can easily add too many clothes for it to dry well.
@michaelb47279 ай бұрын
If you buy two combo machines, it takes up the same amount of space as a standalone washer and standalone dryer, but it's actually faster because you don't have to wait for the last drying cycle in the pipeline, and also you usually get bottlenecked by the drying time on most commercial machines, so you would effectively have two dryers. (Also, you can be more lazy and leave more clothes in the dryer overnight.)
@PennsyltuckyPhil8 ай бұрын
In the scheme of uOPs I would have two dryers with different capabilities. I have a small spin dryer as well as the conventional dryer, washer occasionally refuses to run spin necessitating a branch to move the items into spin dryer before normal move to conventional dryer.
@ristekostadinov28209 ай бұрын
The person who wrote the article about risc-v taking piece of x86 forgets to mention that no company who makes risc-v processors use it vanilla, si-five not only design soc architecture they develop extensions (more complex instructions) to solve the problems they need. Maybe tiny microcontrollers use vanilla risc-v that's what makes them cost 20 cents (besides free isa), but for high performance computing they does similar stuff ARM/x86.
@nicostein98759 ай бұрын
Having a good teacher successfully explain something novel to you feels like watching a magician.
@TheHighborn5 ай бұрын
My university teachers wish they were as good as a random KZbin channel....
@brennan1234 ай бұрын
Loved this. I used to program almost daily in assembly language (MOS 6510, x86, Hitachi H8/SH) up until about the original Pentium days. Knew the opcodes and their encodings inside and out and could write simply assembly programs with just a hex editor (no assembler). Never really did any 64bit stuff. This is fascinating how much things have changed since then. Never knew about the table showing how things can be scheduled with the different ports. Haha, makes me remember the 320x200 screen resolution and needing to do get the pixel's memory location (y*320 + x): XCHG AL, AH (swap the low 8 bits and high 8 bits, multiples by 256 effectively: y*256) MOV AX, BX SHR BX, 2 (shift right by 2 bits, divides by 4 so: y*64) ADD AX, BX (adds the y*256 + y*64 to get y*320) All this because it was like 5x faster than MUL AX, 320 Can't believe I can still do this from memory without looking anything up. lol
@pbentesio9 ай бұрын
Casey Muratori is on a short list of people who motivate me to keep learning. It is inspiring to see people this knowledgeable about the subjects I love.
@mjthebest72949 ай бұрын
Him and Jon Blow are my top ones.
@dimitrioskantakouzinos85907 ай бұрын
Where can I find more from Casey Muratori?
@alexandertownsend50796 ай бұрын
The channel Low Level Learning@@dimitrioskantakouzinos8590
@alexandertownsend50796 ай бұрын
Thor from Pirate Software is high on my list.
@tamertamertamer48749 ай бұрын
I‘m dyslexic. So I have a dyslexic dude reading for me lmaooo.
@cat-.-9 ай бұрын
I'm not dyslexic, but I'd like to think I am to justify me reading very little
@andrewdunbar8289 ай бұрын
Man, I'm not dyslexic but I'm still the slowest reader I've ever met.
@ark_knight9 ай бұрын
If it helps, you can think of it as "pipelining". If you were just reading it yourself, you would be reading it yourself. But since you are hearing him read it, you can go do other task. Cue multi-threading. (Or get entertainment out of it while still learning)
@XDarkGreyX9 ай бұрын
I can read fast but just as in school I may need to read a sentence 10 times even at low speed to even just barely get it.
@tamertamertamer48749 ай бұрын
@@ark_knight lmaoooo true
@aliasjon83209 ай бұрын
Are we also going to get a "X86 doesn't need to die" with Primes face photoshoped onto Mercy from overwatch as the thumbnail
@MrHaggyy9 ай бұрын
XD the mirrored Mercy from upcoming season would fit great.
@technomancer759 ай бұрын
While riding on a fake horse ;)
@XDarkGreyX9 ай бұрын
@@technomancer75 should be cow he is riding on. He owned a cow once, or does own them still.
@miroslavhoudek70859 ай бұрын
I worked on this rocket launcher once, and the on-board software was compiled for both the ARM based embedded in the rocket - but also for Intel PC development workstation for easier testing. So we had a little-endian code and a big-endian code produced. And we were doing bitwise arithmetic and network transfers that had to work in both environments, all that in Ada 2012. I'm confused still until today.
@OurSpaceshipEarth8 ай бұрын
nasa loves those coc pc on chip ecc ram [x3 mCHINES IN CVASE SUBSPACE TREKKING BIT FLIP DISAGREEEMENT]
@cloakbackground86419 ай бұрын
I've wondered from time to time if it'd be easier to just write μops directly and peal back the abstraction, but Casey explaining it as _compression_ made it suddenly make sense: CPUs are much more limited by data transfer than processing.
@warpspeedscp9 ай бұрын
You'd be going back to itanium and ps3 cell processor level if you did that, haha
@kainlamond7 ай бұрын
@@warpspeedscp hey those cell SPEs can still do some Interconnection tricks that we are just now getting to comparable speeds, i wonder what the PS3 would of been if the cell CPU has 2 PowerPC cores
@warpspeedscp7 ай бұрын
@@kainlamond well, true...
@leeroyjenkins07 ай бұрын
But does the latency matter that much if you're constantly executing stuff though? You care more about throughput than anything as far as instructions go. My main takeaway was more that the CPU has a front-end to translate the instructions and a back-end to actually execute them and it makes sense to let the CPU dispatch higher-level assembly instructions as it sees fit rather than micromanaging the way the instructions are executed. Basically assembly is the abstraction layer that allows CPU designers to change the microops however they want to optimize it without software needing to be recompiled every time they change the smallest thing on the CPU, so it makes sense to have it be very descriptive so it can translate into microops in the most optimal way for the current physical layout.
@seinfan95 ай бұрын
As your program grows larger and more complex, you'd quickly realize that having to direct what happens with the architecture every step of the way becomes a nightmare of overhead that will have you spending much more time coding, potentially to a point that makes finishing the program a near impossible task if the goal is to speed up execution. Emulating the pipeline, whether or not you'd allow branch prediction and dealing with cache misses, handling interrupts and saving the pipeline state, stopping invalid memory accesses, the determination of allowing for different logical units to handle mathematical and Boolean processes simultaneously... Needless to say, it wouldn't be easier.
@sdwone9 ай бұрын
So glad people like Casey are still out there fighting the Good Fight! Because the way things are going, computers and software development in general, will get so complicated that only an elite few will truly understand it all. And those elite few will have unprecedented power! So yes! I'm not saying that all developers need to get a degree in all this low level stuff... But that, the more of us that know, even roughly, how a computer actually works, the better!!!
@XDarkGreyX9 ай бұрын
More and more people use hammers but fewer and fewer know how do build or at least understand them? Applies to countless fields, but would that be a valid metaphor?
@sdwone9 ай бұрын
@@XDarkGreyX Yeah... That metaphor sounds totally reasonable to me! 👍🏼 Particularly in this industry.
@teaser60896 ай бұрын
It's a good thing. Programming shouldn't require the extend of knowledge that those who design the processors do. The democratization of programming by the development of High Level Programing Languages is the best thing that ever happened to the landscape of development. Yes high level programming languages are often less efficient than low level languages, but not everything has to be written most efficiently, with current hardware we can prioritize the ease of coding and readability over pure performance.
@kurku372520 күн бұрын
@@teaser6089 no... the problem is that everything becomes super mysterious, like what is a jit, what is a stack, why it has this strange limit, what is a heap, why people are talking about these things, how compilers work, what is the malloc, how it does the thing, why it is even needed and so and so forth, why segfault is the thing blah blah blah, just neverending stream of buzzwords with weird explanations which kinda make sense, but lacking something real and understandable when you can actually just spend 5-10 hours of your time, learn a little bit of C, GDB or just use visual studio and peek at memory and bytes and see what actually is happening with your eyes without these handwavy explanations and everything is going to be super clear from now on like freaking yeah, RIP, which is $pc in gdb i can tell computer: `go there execute this thing` and it will do so I can do it in debugger interactively I can edit memory myself, why not I can get a segfault, catch it, and fix the bug just writing directly into the memory the sequence of bytes my computer, my program, it does what I tell it to do go check handmade hero `intro to C section` from Casey very old streams, and he does not explain everything in super-duper details but basically after them you could probably create a simple JIT interpreter you just need to define semantics, write the parser, make the translator, spit the code, mark the page executable, patch the bytes and let it run from there rinse and repeat and that is just wow I litterally transcended during these 10 hours of watching his explanations and going through the debugger
@P-39_AiracobraКүн бұрын
@@teaser6089 Making hardware more complex doesn't make software less complex. Compilers are becoming more and more archaic and untouchable because hardware developers expect to be able to do anything and just have software developers support it.
@channel111219 ай бұрын
Casey was so disappointed when he didn't understand why little-endian was better, and also didn't care enough to understand it.
@andrewdunbar8289 ай бұрын
I think it only occurs to us when we've done assembly programming, or bit-banging level C programming.
@YaroslavFedevych9 ай бұрын
@@andrewdunbar828or speak German, numerals from 21 to 99 are little-endian there. Or when we add/subtract/multiply on paper, somehow it’s easier to do little-endian.
@andrewdunbar8289 ай бұрын
@@YaroslavFedevych Ja ich weiss es! In English numbers are big endian and can make writing functions that do things like decimal conversion, ASCII conversion, adding thousands separators a bit unintuitive or at least more tricky. I started assembly on the Z80 which was little endian but lost the feel for it so much after moving to the big endian m68k that little endian never felt natural again.
@realmarsastro9 ай бұрын
@@YaroslavFedevych gimme 99 of them luftballons, amirite? In Norway we actually can pronounce the numbers 21-99 in both big endian and litte endian. The little endian variant is more common with older people, it's unfortunately dying out.
@arnesl9299 ай бұрын
Yeah, even I understood, I was a bit surprised by the lack of enthusiasm.
@Dom-zy1qy9 ай бұрын
At the washer dryer and washer/dryer analogy, i thought he was going to say the washer dryer combo was faster because I always forget to swap the laundry to the dryer when it finishes... So it ends up taking like 2 hours extra. It was a good analogy though, I actually didnt even know about micro ops before this, but it makes a lot of sense.
@darekmistrz43649 ай бұрын
I also wanted to say that I forget the laundry or that you need a specific person to sit next to the washer not to forget to move it from washer to dryer
@GeorgeGzirishvili7 ай бұрын
1:01:24: Correction: "Reduced" in the Reduced Instruction Set Computer (RISC) doesn't mean fewer instructions but simpler instructions; in other words, instructions reduced in complexity or doing fewer operations per instruction, as opposed to the Complex Instruction Set Computer (CISC).
@lesterdarke9 ай бұрын
I would be interested for you to have this guy come on and talk more about the differences between arm vs x86. In particular what actually makes arm more energy efficient - everyone always makes it sound like its down to cisc vs risc, but this video makes it sound like this may not be the case.
@coolworx9 ай бұрын
I'm not nearly a programmer. I'm just an arborist, who gets up at 5am and works in the orchards of the Okanagan, But don't get me wrong, I love the job I fell into. I'm a fruit tree doctor, and I've saved many a patient! But in the evenings I'm a hobbyist hack, building personal projects because it's fun. I fkking love this channel. And this episode was outstanding. Prime was asking a lot of questions I was having, and the presenter was excellent. Ohh I'm sure the information will never be put to direct use, by myself. But it's good to know how the things in your life actually work. I certainly have a better understanding of the term _microcode_
@XDarkGreyX9 ай бұрын
Huh, really interesting that you are here. Kudos. *clap clap*
@loo_99 ай бұрын
most of programming are techniques only an engineer truly needs to do. but computer science is a beautiful subset of math that anyone can appreciate. as an arborist you are likely implementing the same optimization techniques that CPUs use without thinking about them, multitasking, task overlap, task reordering, pipelining, etc. it’s just a definitive way to think about the world
@coolworx9 ай бұрын
@@loo_9 Like i said, I've been building side projects for a while. One of them is a nifty nodejs and nedb tree notes app that lets me keep track of progress and search for any key words or time intervals. Now I want to get some charts going to show trends. I have almost 10 years of data, that I've been gathering. Including daily weather conditions. So... ya. I tree surgean by day, and code at night. And the best part is, I have no deadlines, no HR directors, no tiresome teammates... ;-)
@lritzdorf9 ай бұрын
> But it's good to know how the things in your life actually work. This. This is the reason I enjoy computing, from both a hardware and software perspective. Yes, the tech is cool, but what I truly love is understanding what the heck these magic boxes of lightning and sand actually _do._
@allmycircuits88508 ай бұрын
As an arborist, you're officially a HACKER :)
@QuicksilverSG9 ай бұрын
In reality, all generations of X86 and x64 assembly language have been emulated in microcode since the advent of the Pentium P6. The underlying cpu hardware contains banks of interchangeable 32-bit and 64-bit registers, along with RISC primitive instructions that operate on them. Both X86 and x64 assembly language instructions are parsed by a cpu hardware interpreter that converts them into streams of microcode instructions that are speculatively executed in parallel by multiple internal execution units. This is the actual Intel "machine code", and it is not possible to manually program with it. Human assembly language programmers can only use hardware-interpreted X86 and x64 instructions, the underlying Intel microcode is locked inside the cpu. Decoding X86 and x64 assembly language can actually run faster than an ARM cpu executing manually programmed RISC code. That's because Intel assembly language is more compact than RISC machine code, and can thus be loaded more quickly from memory, which is often the limiting factor in code execution speed. Underneath the hood, both Intel and ARM cpu's are highly optimized RISC machines. The difference is that ARM assembly code is executed directly by the cpu, while Intel assembly code is virtualized and emulated by internal microcode.
@astrixx9 ай бұрын
That was a pretty useful explanation.
@theexplosionist20199 ай бұрын
I read they use 70~ bit registers to store flags + GPRs.
@adrian_b429 ай бұрын
You are right. The only complaints about CISC in x86 are the variable length of the instructions that makes decoding a more complex and also the rarely used instructions carried over the decades.
@mitrus49 ай бұрын
Are you sure about your last point? With ARM you can start decoding instructions in advance, because all of them are 4 bytes each, so you don't need to wait for decoding to get the address of the next one. With x86, on the opposite side such a dependency of variable length doesn't allow for it. I think it can outweigh the increased pure count of simple instructions, especially if there is no code bloat and hardware instruction memory prefetching does its job well.
@QuicksilverSG9 ай бұрын
@@mitrus4 - Program memory is loaded from RAM into processor caches, which on Intel chips are divided into interchangeable 64-byte cache lines. The cpu instruction decoder has no need to calculate the RAM address of the next instruction, it relies on the program counter to automatically keep its local instruction cache full. If that instruction cache ever underflows, the instruction decoder will have to wait until the cache is refilled from RAM. However, cpu execution units and data load/store units can continue to process previously decoded instruction micro-ops, which proceeds speculatively in parallel with instruction decoding. In practice, it is far more common for execution to stall due to logical or algorithmic dependencies than for speculative execution to outrun instruction decoding. With ARM cpu's, each machine code instruction is four bytes long. On Intel cpu's, the most common instructions take just one byte, though complex instructions can be up to fifteen bytes long. On average, Intel machine code tends to be about half the size of ARM code.
@piotrj3339 ай бұрын
This is garbage article. First, x86 is also RISC architecture for long time. CISC instructions become RISC internally in CPU, and fact we don't use RISC directly, only costs us 5% at most 10%. That is essentially entire cost of X86. 2nd. AMD ryzen for laptops in energy efficient configurations can compete with Apple M processors, at least in work done per joule. There is some problems with idle draw, but those can be attributed to Windows and itself SoCs (like Apple has soldered RAM really close to CPU) not to architecture again. 3rd. Spectre and Meltdown affected x86, ARM and even IBM power architecture.
@abbe96419 ай бұрын
Yeah chuck a good mobile Ryzen processor to linux instead and the difference i heard is night and day with battery life improvements in the 10's of %
@gigitrix9 ай бұрын
Don't disagree but that 5/10% on something that could be delegated to compilers in a post-moore's law world is more relevant than you might think
@neoqueto9 ай бұрын
Modern ARM and x86 and even RISC-V ISAs are pretty much all nearly identical. They can all do the same things and the notion that ARM is a "less complex" architecture because "ARM is RISC and x86 is CISC" should only be made fun of. However we are at the point when we have to scrape the bottom of the barrel for miniscule gains.
@TheSulross9 ай бұрын
Yeah, I don't want the main RAM as a permanent, unchangeable fixture of my computer the way Apple does things now. Now is true that one could produce a lower transistor count x86 if stripped out all of its legacy stuff and only implemented the instructions used by modern software and operating systems. That would be an interesting project (and am suggesting cleaving more than just the CPU coming up into 8086 real mode). After all, any modern CPU per its performance can more than adequately emulate vintage CPUs of the '70s, '80s, up to mid 90s - for those that want to play their fav retro games. Retro computer emulators on ARM - like the Pi - prove this every day. So x86 doesn't need the transistor real estate that is dedicated to supporting stuff that are vestiges from 40 years ago. Git rid of that fat to lower the power draw. And aren't there too many redundant SIMD instruction sets - why not trim that down to what is only in vogue in, say, last decade?
@neoqueto9 ай бұрын
@@TheSulross especially with EFI being so commonplace, a lineup of "stripped down" x86 parts could maybe be viable... but then again they have hundreds of engineers wracking their brains all day long about performance and power efficiency improvements so it's not like they haven't thought of the same ideas as us smoothbrains have (and there's probably reasons they concluded they're stupid).
@reaktorleak898 ай бұрын
So someone bought a Macbook, loved the battery life without researching why it was so good, and then wrote a Kill x86 article?
@lostcosmos324512 күн бұрын
yup! pretty much like every internet journalist without any real investigation or research just spewing biased opinions instead of facts.
@carloslint99147 ай бұрын
2 cents: Last time Intel tried to break compatibility with x86 was the Itanium, which is now a footnote on modern CPU's history. Most people never heard of it.
@Cmanorange9 ай бұрын
39:35 casual magic
@Nirsi9 ай бұрын
"I'm ancient" yeah sure "I was professionally programing since 1995" well you program longer than I'm alive, you earned that title
@darekmistrz43649 ай бұрын
He looks like he was born in 1988 at most. He must have been programming at the age of 7! No wonder he is a genius
@PennsyltuckyPhil8 ай бұрын
I thought the definition of ancient was knowing of EBCDIC and having an IBM 370 gold card within arm's reach.
@TheVoiceofTheProphetElizer8 ай бұрын
Unless you're solving math problems using vacuum tubes, I'm not sure what exactly you all are defining "ancient" as.
@PennsyltuckyPhil8 ай бұрын
@@TheVoiceofTheProphetElizer Well I guess we could go back to where a program's bug is a moth caught in a relay ...
@JacobBogers8 ай бұрын
I programmed assembly since 1983 (was a 14 yo kid back then)
@bzboii9 ай бұрын
(optimizing compiled language argument) people write for the llvm using their language and the llvm writes itself for the “API” of the cpu which is the instruction set like x86-64 and the instruction set writes for the microcode and the cpu can only “do” certain actual things at the end of the day optimizations at every layer, it’s turtles all the way down until the actual cpu architecture
@darekmistrz43649 ай бұрын
And then you need to remember that all that computer is doing is juggling memory: from disk to ram, from ram to L3 cache, from L3 to L2, from L2 to L1, from L1 to CPU and all the way back to disk.
@theondono9 ай бұрын
The guys at the Oxide Computer podcast (ep 1) covered the reason for Little Endian with Jeff Rothschild, and apparently the reason for that choice had to do with earlier cpus having 1-bit length ALUs. This meant that you had to “stream” the bytes into the ALU, and little endian helped simplify the carry logic.
@warpspeedscp9 ай бұрын
Its nice that LE still ends up being useful when things are scaled up
@nahkh19 ай бұрын
For those who are curious, the tools used for cutting a specific shape out of a material is called a die: en.m.wikipedia.org/wiki/Die_(manufacturing) I'm assuming the cut pieces (e.g. CPUs) took on the name of the tool used to cut them over time. I'm also pretty sure that they don't use literal dies anymore to cut out the individual chips from the silicon wafer.
@blarghblargh8 ай бұрын
didn't find anything that's authoritative on this, but I did find a few results referring to the process of cutting up a larger item into square pieces being "dicing". the results of that are "dice". an individual item is a "die", and found results saying that was the etymology for integrated circuit dies. CPUs aren't stamped/casted/extruded, and never have been, which is why I am not sure the linked wikipedia die concept applies. But I can't fully argue either way. What I can agree with is it is related to the manufacturing process. We can be pretty sure of that.
@nahkh18 ай бұрын
@@blarghblargh the specific use of die I mean from the article is a stamping die. It's basically a matching set of "knives" with a complex geometry. The dies are pressed together with the material in between, and while I doubt that's how silicon chips are cut these days I can believe that's how it would've been done in earlier days.
@autarchex8 ай бұрын
@@nahkh1 Integrated circuits are batch manufactured in a grid pattern on a wafer of (most commonly) pure crystalline silicon. The individual product pieces are separated from the wafer ("singulated" or "diced") using a saw, which converts the wafer into a large number of identical small parts collectively called "dies" or sometimes "dice" (older usage) and one of these is called a "die," and this term in a semiconductor industry context always refers to the product and not the tool. There are a few other rarely used singulation techniques other than a saw, like laser cutters, waterjet cutters, even particle beams that can cleave the wafer by precisely imparting millions of crystal defects in a line. As far as I know we never used stamping die-cut (where "die" means tool, not the product) techniques though; silicon crystal is quite hard and brittle, and the wafers chip and shatter more resembling thin discs of glass than thin discs of metal. Nonetheless, I'm sure the terms share a common history. There are other contexts too, where die and dice refer to the product and not the tool that made them - for example, playing dice.
@rustyshaklford95578 ай бұрын
As someone who works in industrial stamping, I can confirm that dies (in combination with punches) do indeed cut things.
@haraldfielker46359 ай бұрын
20 years ago - same talk :) The solution in ~2000 was "Intel Itanium" 😂😂
@stepank19 ай бұрын
Yeah people blaming intel / AMD for "bloated" x86 and "tech debt" as if this decision was not made solely to please the customers is pretty incorrect
@AlecThilenius9 ай бұрын
I had this thought too. Intel tried to fix these issues in Itanium. It's now nicknamed i-tanic because no one wanted to recompile their code to run on Itanium. I'm not at all a fan of Intel, having worked there for 2 years back in college many moons ago, but you can't soley blame them for x86 legacy.
@stevesether9 ай бұрын
@@AlecThilenius as I recall, Itanium never achieved the supposed performance gains, even if you DID recompile your code. In the Linux world, Debian alone supports 4 different CPUs. x86, MIPS, Arm, and PowerPC. Re-compiling isn't really a problem. I believe Itanium was supported. The servers themselves were freaking EXPENSIVE. I've honestly never seen one, used on, or worked anywhere that had one.
@timothygibney1598 ай бұрын
@@stevesetheryou underestimate how much technical debt for legacy dos and Windows 95 software run on modern business. No compiling Deb files for PowerPC or arm won’t help run accountings old Oracle macros written vb 5 back in 1998 in Excel only.
@methanbreather8 ай бұрын
@@AlecThilenius it wasn't just recompiling. Itanium is just a slow architecture. Every architecture that relied on the compiler do do the work for the cpu failed.
@Dungeonseeker1uk9 ай бұрын
Fun Fact: Intel used Pentium over 586 in a stupid attempt to stop the clone market. They couldn't trademark a number so they used Pentium & added the MMX extension which was also trademarked. OFC AMD and VIA just used 586, they didn't get MMX till much later when it was basically redundant, AMD created 3D Now as a response when they adopted the K6 moniker (K6 was 686).
@kahnzo9 ай бұрын
I had completely forgotten that fact, but that's right!
@craigpeacock19039 ай бұрын
Ah, the K6... the first cpu I bought myself was the K6-2 600...
@betag24cn9 ай бұрын
afik, amd still has those instructionsets from k6 era
@OpenGL4ever9 ай бұрын
@@betag24cn No, 3dnow! is obsolete and got removed from newer AMD CPUs. There was an agreement about the SSE family of instruction sets, thus 3dnow! was no more needed and it was incompatible with SSE.
@betag24cn8 ай бұрын
@@OpenGL4ever now that youmention it, i have not checjed that in many years, probably you are right
@TurtleKwitty9 ай бұрын
I forgot those transparent board exist for a sec so when he started writing in the air I was so confused and amazed XD
@jewlouds9 ай бұрын
I was more impressed he 2as writing backwards
@tehwibe9 ай бұрын
@@jewlouds Nah, the camera is flipped horizontally
@XDarkGreyX9 ай бұрын
The scroll-up got me
@scito15417 ай бұрын
aka glass?
@TurtleKwitty7 ай бұрын
@@scito1541 Way too brittle, it's plexiglass if some kind and they tend to have lights if memory serves so the writing can slightly glow with neon colors
@BilalHeuser18 ай бұрын
When I started to write assembly languages for my TRS-80 which was Z80 based, I also learned about the Intel 8080, which is what the 8088/8086 CPUs were based on. Much of the register structure was duplicated on the Z80, but Zilog added extended instructions. Knowing that makes understanding the 8086 and subsequent generations that much easier.
@freyja58009 ай бұрын
One thing about the conclusion that stood out to me is that the article makes out of order, speculative execution & superscalar seem like they are bad things, while they are things you actively want in your compute hardware. Like, to make things faster you can either a) do the same stuff, but faster, or b) do more stuff, but in the same time (i.e. faster clock speeds vs. parallel computation). and while there definitely are situations where the first is the only way (dependent operations etc), if you have the choice the second is always more preferable, because the energy required for running it at a lower speed but in parallel are much more favourable, to the point where it is useful to have dedicated low-speed hardware for certain tasks (accelerators, in particular gpu's are a prime example of that)
@leeroyjenkins07 ай бұрын
I don't think they are painted in a bad light and I think the presenter here was just going in with the preconceived idea that the article is wrong, can make no valid point and that everything they write is a criticism of x86. The article describe the fact that speculative execution is a thing that is done on modern CPUs, and then they try to argue why having a more complex instruction set makes these things more difficult to do. Namely because it's difficult to figure out what instructions depends on which other instructions when they're so complex. So having complex instructions makes out-of-order execution slower (because there are more checks to do between two given instructions?), seems like a valid point to want to make. I don't know that it's true, but I also will never know because instead of discussing that we're arguing that the writer of the article knows nothing about anything.
@boptillyouflop3 ай бұрын
@@leeroyjenkins0 IRL most of the badness of x86 is determining instruction length (which usually gets cached nowadays) and 286 protected mode stuff (the mode in which Windows 3.1 worked). Once you split instructions that both load and do math (ex: add eax, [ebx + ecx*4] -> mov temp, [ebx + ecx*4], add eax, temp), basically every instruction reads 2 registers, writes 1 register and maybe updates flags. Which is why they didn't have *that* much trouble making x86 Out-Of-Order CPUs (PPro came out in 1995, one year after the PowerPC 603 but the same year as MIPS R10000 and PA-RISC PA-8000 and SPARC64).
@jaysistar27119 ай бұрын
Von Neuman architecture basically means that code and data can be in the same address space. Harvard architecture, like an 8051, has code and data in 2 seperate address spaces.
@thewhitefalcon85399 ай бұрын
And Modified Von Neumann means they have separate caches.
@catcatcatcatcatcatcatcatcatca9 ай бұрын
To me this was taught with emphasis on different busses: instruction bus and data bus can be accessed simutaniously, and can have different widths. For example 14-bit PIC controllers have 14-bit instruction bus, but 8-bit databus. Any modern microcontroller needs an ability to write instructions while running and execute them later: I don’t want to rewrite the Rom of my laptop everytime I install something. And the kernel needs some way to give userspace code access to the CPU (the alternative would be virtualisation, I think). Either way the relevant consepts of privilege are so far removed from the hardware that the distinction between Von Neuman and Harvard architectures isn’t as meaningful as their orginal definitions.
@ern0plus49 ай бұрын
The idea of von Neumann's idea is not about address spaces, but about the representation of the program: if it's represented as data, we need no extra mechanism to handle it (load, save, execute). The executor unit (CPU) already should access memory, loading and storing data from/to, so if we represent program as data, it's not a big deal, we can use common mechanisms for it. I don't know how computers looked before this idea, AFAIK, the programming was done by re-wiring the whole computer physically for the specific task (aka. program). Storing program in memory is a brutally significant improvement: need longer program? just add memory; bug in program? just modify some bytes - etc.
@Eugensson9 ай бұрын
There's also Mill architecture
@MrHaggyy9 ай бұрын
@ern0plus4 in the really old days, when computer where mostly mechanical devices like the enigma machine, you had mechanisms for "instructions" and another medium like paper cards for "data". When they started moving from mechanical to electrical they still handled instructions differently from data. It wasn't until the first silicon transistors that you made and handled data on the same material. Today it's a bit odd as on chiplets we go back to using different processes on the same wafer, so we somehow got different materials again. But basically any CPU today doesn't distinguish between code and data. But it might come back if people really do care about security. Harvard architectures are far more robust by design against injection attacks.
@jaysistar27119 ай бұрын
Actually, you have quite a few of us "ancients". I'm not even sure when I learned C, but 1996 is when I swiched from DOS to Linux. I got a Windows machine because people seemed to have them, so I had to recompile/test on it.
@PixelThorn9 ай бұрын
What made you switch to Linux that early?
@idhindsight9 ай бұрын
@@PixelThornnot op but similarly old, when win95 came out, I felt it “hid” the true OS and hated it. When I heard about Linux in 99/2000, I was an instant convert.
@jaysistar27119 ай бұрын
@@PixelThorn Windows 95
@darekmistrz43649 ай бұрын
@@idhindsight Hiding isn't bad. It's just a diffrent client use case. That helped Microsoft gain the popularity it has today.
@idhindsight9 ай бұрын
@@darekmistrz4364 ehh I didn’t say it was objectively bad. It sure as hell isn’t for me (and most developers)
@nezbrun8729 ай бұрын
8008 and 8080 8 bit registers are A, B, C, D, E, H & L, plus 16 bit SP and PC. H & L are commonly combined to make a 16 bit memory pointer. On the 8080, the B & C, D & E as well as the H & L register pairs can be combined, allowing up to three 16 bit registers, typically for memory pointers, especially HL, and to a lesser extent DE and BC due to the non-orthogonal instruction set. Furthermore, on the 8080, you can also do in-place 16 bit increments & decrements on these register pairs, but results don't affect the flags. HL can also be used as a 16 bit accumulator for 16 bit addition.
@X.A.N.A..9 ай бұрын
GameBoy?
@jaysistar27119 ай бұрын
@@X.A.N.A.. No, but a few arcade machines. The GameBoy uses a Rico clone of a Zilog Z80, which is related to the 8008, but not compatible, just as the 8086 is related to the 8008, but not compatible. The book called "Nailing Jelly to a Tree" is a good Z80 starting point.
@tconiam8 ай бұрын
I was looking for this comment! The special purpose uses if the registers and limited addressing modes are the 8080 legacy issue. Compared to the Motorola 68000 series consistent registers and addressing modes makes programming the 68K a dream compared to Intel. Sadly they couldn't keep up in performance and lost out to x86.
@andrewdunbar8289 ай бұрын
Even in the 8 bit days plenty of us used to program in machine code, simply because we didn't have assemblers. We had documentation of the bitfields of the instructions and addressing modes and wrote BASIC programs that put all the bytes we figured out on paper into memory.
@6355748 ай бұрын
This sounds like medieval torture of computers. I wasn't aware someone could even program in machine code. Big copium.
@leeroyjenkins07 ай бұрын
@@635574 it's just assembler without names really, wildly inconvenient but not inconceivable
@leeroyjenkins07 ай бұрын
It's so funny to me that a computer would have BASIC but not an assembler
@sebastianhormann92614 ай бұрын
Motorola 68x assembler programmer here, absolutely digged that interview. People compare aplles and oranges and dont know what they are talking about in terms of architectures
@robgrainger53149 ай бұрын
Wonderful exposition by Casey there. I had a decent understanding of x86 architecture, but still managed to learn something, which is always a pleasure. Ps. Please remember to put links in your videos.
@kamikaz1k9 ай бұрын
this is was so freaking helpful to understand. Prime thanks for forming your relationship with Casey; Casey, you should definitely piggy back off more creators' reach to share your wisdom. This is a win-win-win arrangement. 👏
@jaysistar27119 ай бұрын
In the original design, AX (16-bit) is AH (8-bit) and AL (8-bit). That's the `A`ccumulator. AX has overflow into the `D`ata register (DX, which also has DH and DL) for MUL and DIV instructions. There is a `B`ase index register and a `C`ount register. The other 4 are Stack Pointer (SP), Stack Base Pointer (BP), Source Index (SI), and Destination Index (DI). Some instructions only work with certain registers in Assembly (very CISC). EAX is 32-bit. In x86_64 RAX and other Rxx registers are 64-bit, and those original 8 register have general functionality.
@thewhitefalcon85399 ай бұрын
64 is RAX
@plaintext72889 ай бұрын
Assembly level debugging and optimization are black magic
@virno694209 ай бұрын
Most calling convention uses RAX for return value, the naming conventions are just legacy, it's not actually used as an accumulator. RBX is not the base pointer, that's RBP, there is no RSB register. What instructions only work with certain registers? This just isn't true I believe. Atleast for the general purpose 16, maybe SSE, AVX, or XMM registers are instruction specific idk we haven't got that far in my class yet.
@teodorkostov3659 ай бұрын
@@virno69420 base in rbx doesnt mean stack base, it means memory base which is what it originally was intended for. and some instructions do use the registers with their original intention, like div, rep, movs etc...
@jaysistar27119 ай бұрын
@@thewhitefalcon8539 I edited to clear some things up, and fixed that typo.
@MrDivinePotato8 ай бұрын
Great chat, love this kind of geeking out! I don't understand it fully but I feel like I got a slightly clearer picture from this.
@Maisonier8 ай бұрын
You should get an e-ink device for reading. I like the Boox Note Air 3C (android e-ink tablet), but you also have the remarkable, supernote, papyr.
@johnyepthomi8929 ай бұрын
Wow.. I I like this format. Very engaging and inviting.
@drooplug9 ай бұрын
Ok. The glass board was an awesome surprise. I've seen it before, but it's still cool. Someone needs to tell me how he scrolled the the previous writing up.
@c0dy429 ай бұрын
its probably just a massive piece of glass that can move up or down with the help of motors
@APaleDot9 ай бұрын
@@c0dy42 I believe it's a sheet of plastic that's rolled up on the top and bottom.
@Muskar29 ай бұрын
I take his course and he uses it all the time. I'm not exactly sure how he does it but probably a foot pedal. We saw the full height of the board in this video, so he has to wipe it when he needs to use more space than this.
@c0dy429 ай бұрын
@@APaleDot I thought that at first as well. but I think that we would be able to see that, because then there would have to be a big piece of acryl or Glas behind it, so he has a solid surface to write on.
@cybermuse69179 ай бұрын
Suggestion for the reason they call it a die/dye , is when coins were minted you would use a die/dye to cast a particular pattern from the metal being formed. I imagine its merely a reference to the pattern being used.
@OurSpaceshipEarth8 ай бұрын
CEPT W MORE UV LIght less heavy hamnmer head bangers:)
@BrankoDimitrijevic0218 ай бұрын
Funny thing in reference to 1:02:55 - files in a ZIP archive are compressed independently and you can "jump" to any of them using "central directory" which is physically at the and of the ZIP archive. So you could "decode" files from a ZIP archive in parallel, instead of serially.
@Ruuod8 ай бұрын
17:20 isnt that paragraph ment to describe two things: 1. there are CPUs with multiple cores that have an ALU and CACHE for each core and can therefore work through instructions in parallel 2. when code is compiled , instructions can be rearranged to optimize runtime. for example when there is the need for waiting to set a variable x with x = a + b and a+b are still 'in calculation' other instructions like loading new data can be executed in the meantime. or am I wrong here. architecture classes is something I had like 10 years ago
@TheJukeJuke29 ай бұрын
I saw you drop a Wheel of Time reference in a video earlier today and now I have to watch you daily.
@UltimatePerfection9 ай бұрын
Unfortunately that's not going to happen because of all the business software and games that requires x86. So any "replacement" would need to be backwards-compatible with x86, at which point it would be just x86 with extra steps, so why bother?
@jaysistar27119 ай бұрын
I don't think so. The Apple Macintosh stated on the M68000, then switched to PowerPC with support (an OS built-in emulator) for M68000 apps, which then switched to x86, which still had both PowerPC and M68000 support, then they switched to x86_64, which you may think was an easy jump, but probably required quite a bit of code page management in the OS, then they switched to ARM. Although they've dropped some of that emulation along the way, you can still get it, and things still can work as they always have. I think Windows and Linux could do the same. with QEMU in Docker, containers already do some emulation with buildx, so it's not too far of a jump.
@UltimatePerfection9 ай бұрын
@@jaysistar2711 I assure you that gamers unwilling to take the performance hit associated with the emulation will stop the switch dead in its track. With Mac the thing is that a) it's a single machine by a single company so people literally have no choice but to switch if they want to use newest and best (debatable...) products, not thousands machines by thousands companies, some even being DIY affair, and b) Mac never really did games very well. Try running Cyberpunk 2077 on a mac. You can't. Or at least, you can't run it well.
@dragonproductions2368 ай бұрын
@@jaysistar2711 The only reason why apple can do arm is because it's a horrible monopoly and people complained about the very real issues the switch caused. You're basically saying "The company store can switch and deprecate currency every year, why can't the government?" The answer is that it only exists due to artificial reasons ( You being horribly in debt to the coal company or either being too dumb to leave apple or unable to due to them grabbing your work software in their malformed tentacles).
@jaysistar27118 ай бұрын
@@UltimatePerfection I agree that Apple products are overpriced versions of inferior hardware, but I disagree about a performance hit being required to exist if the PC switches to another CPU because I don't know of any native version of the x86_64 at this point; the whole x86_64 platform is emulated on both AMD and Intel chips. Also, REDengine (Cyberpunk 2077) is a cross platform engine. It can run on the Switch (ARM), but doesn't because GPU performance and capability (ray tracing, API support, etc.) is more important for games than what CPU ISA they use. I've ported hundreds of games commercially, and I can tell you that Apple's mistake was in making Metal the only option moving forward. That's just more work, and, as a game engine dev, we're an overworked bunch as it is.
@UltimatePerfection8 ай бұрын
@@jaysistar2711 Yeah... x86_64 being emulated on an x86_64 processor... Nice try, but no dice.
@CookieBeaker9 ай бұрын
Loved this! I know you can’t go hyper technical every day but please sprinkle it in every so often! This was highly educational and on top of that reducing misinformation the article (intention or not) shared.
@robertthornton21324 ай бұрын
I am a software QA guy and although I do not need to know this information, I have not watched a better video in months/years. Fascinating and educational. Even thought I am still fuzzy on the lowest level of operations discussed, the levels above them now make more sense. I have had words and processes as ethereal concepts for years and I now understand them better. . Thanks.
@IkethRacing5 ай бұрын
Playing devil's advocate here. For the washer/dryer example, if the washer/dryer combo takes up the same space and costs the same as a single washer or dryer, why not use 2 washer/dryer combos instead? Thinking about it, I guess the answer to my own question is the washer/dryer combo will always use more die space (the cost) than a single washer or dryer.
@robertfletcher89648 ай бұрын
the number of people who think he's writing backwards is hillarious
@PaulPetersVids9 ай бұрын
Wow, maybe a top 5 video on the channel. This was amazing.
@darekmistrz43649 ай бұрын
What are other 4?
@PaulPetersVids9 ай бұрын
@@darekmistrz4364 lol idk.
@ScottGrunwald9 ай бұрын
Amazing Video!! Suggestion: Prime and Casey do a long form video on all the details of ARM vs x64/86.I would love to learn more about the differences of the two and why current ARM chips like M3 Max are just as performance as top intel mobile chips yet insanely more efficient. I know the node differences but that can't be all of it. Is the reduced instruction set really giving them that much of an edge? This video was amazing and I learned a shit ton, but have more questions. Sounds like decoding performance might be a huge factor comparing the two.
@azazelleblack8 ай бұрын
It really is largely node differences. Apple is *two and a half* nodes ahead of Intel right now. Also, Intel has to design its CPU cores to work in everything from tablets to big iron servers. Apple has no such concerns. Finally, keep in mind that Apple also has full product stack control, so they make the hardware, the firmware, the drivers, the userland, etc. On the x86 side, every single part of that is done by a different company. So Apple can do optimizations that simply aren't practical in x86-land.
@lycanthoss5 ай бұрын
If ISA was the reason for those performance differences then Intel and AMD would have the exact same performance because they both use x86. ISA doesn't matter. Microarchitecture does. Nodes and microarchitecture is what decides most of the performance/efficiency.
@von_nobody9 ай бұрын
For decoding, but in some tight loops it do not matter if whole code block fit CPU cache and need only be decoded once. If 90% your code sit in couple of hot spots then all penalty from decoding is 10 time smaller and will not matter much. Its only matter for linear code where each instruction is hit once and is long time before program go back to this line.
@retronoby9 ай бұрын
There’s perhaps also a similar problem with the newest instructions (newer than the oldest supported CPU for a given software). Most software don’t use them and detecting the supported instructions at runtime isn’t always practical. The legacy instructions are there because software is a very expensive thing to produce and corporations need to at least recoup the costs of old software before they either buy or produce new software. I wish Hackaday would focus more on DIY hacks like they used to, and less on opinion articles. Thank you for the great and in depth explanations, it was very useful.
@craiggazimbi9 ай бұрын
The Name 🚀
@AregGhazaryan9 ай бұрын
fireship?
@wanishitcha11509 ай бұрын
the rocket man
@infastin37959 ай бұрын
Even Intel themselves tried to get rid of that legacy but didn't succeed.
@radfordmcawesome79479 ай бұрын
are you talking about itanium or something else?
@infastin37959 ай бұрын
@@radfordmcawesome7947 yes. Intel has also recently proposed a new X86S architecture.
@betag24cn9 ай бұрын
it was mostly microsofts fault really, microsoft windows is a legacy of code, nothing modern in it, not since windows 2000 basically with the arm windows project that should deliver some devices this year, perhaps that finaly happens, without intel it seems
@ThePlayerOfGames9 ай бұрын
x86_64-v2 as standard when?
@LouisDuran9 ай бұрын
soon
@williamhinz96149 ай бұрын
X86_64ex*
@deth30219 ай бұрын
Unlikely, intel has tried twice to change the x86 platform. Itanium, and pentium 4. Both failed, so anything they do will have to be backwards compatible.
@Kaznovx9 ай бұрын
In 2020. However, x86_64-v2 is a codename for CPUs with at least SSE4_2 and SSSE3. (which was supported already by 2008 CPUs). For comparison, x86_64-v3 means roughly support for AVX1 and AVX2 (CPUs from 2014 and later) x86_64-v4 is with support for AVX512, but this one is a can of worms
@deth30219 ай бұрын
@Kaznovx OK you meant that. But that is an unrelated topic, as it is about common baselines.
@Waitwhat4699 ай бұрын
28:30 I feel like that is a great example, but also in the actual laundry world, my rebuttal is that a special device that washes and one that dries with each taking up the same space as Washer/Dryer it would be 50% slower in total throughput. So, all I am saying is that we need to increase our WD core counts to dual core so we can finally wash/dry at a 50% increase!
@yanidoesit4 ай бұрын
"could possibly" "could possibly" possibly could bro... I'm allowed to pick on you are I'm just as dyslexic (or worse). Love ya bro, you make boring interesting.
@quas-r9 ай бұрын
Two phenomenal teachers talking about what they passionately love and explaining it to us is pure pure gold.
@Eren_Yeager_is_the_GOAT9 ай бұрын
i hate it that i only have 2 options when i want to buy an x86 CPU
@joseoncrack9 ай бұрын
Yes but it's better than no option at all.🙃
@betag24cn9 ай бұрын
when we had via, it wasnt a option reallyrigth now you have two, but i wouldnt touch anything from intel
@betag24cn9 ай бұрын
@@joseoncrackwell, in apple, there is no option and people is happy on android smartphones you dont choose, you choose the whole device it is nice to have options but sometimes things do work like that
@boptillyouflop9 ай бұрын
Only 2 options for x86 CPUs is down to US government being feckless and derelict in its duty to break down monopolies (well, duopolies here but you get the gist). They let Intel/AMD use the x86 patent pool to completely own x86, to the detriment of everyone else. Citizens United lets big companies bribe politicians and this is how we got here.
@joseoncrack9 ай бұрын
@@betag24cn It's good to have options. But yes, obsessing over it doesn't change a thing either: like, here, many people seem to absolutely despise the x86 architecture (often without even really knowing what it is now), but it just works well enough (at least on the desktop and server markets) and you'll be hard-pressed to find anything with the same performance at this price range. In terms of performance, one exception (really an exception for now) are the Ampera CPUs (128-core ARM-based), and these are still niche products, very expensive. The day there are alternatives to x86 for the same kind of applications, with as much performance and for the same price, but with more "modern" architectures, people will eventually switch en masse. But that's just not the case yet. It is for mobile devices, and has been for years, though, which is a market Intel has always struggled with.
@swdev2459 ай бұрын
I was surprised that Prime seemed to be ignorant about the lower level (like virtual memory) and the x86 legacy stuff. Isn't at least the former part of every computer science degree?
@rusi62199 ай бұрын
He could have forgot it happens
@sheikhshakilakhtar18659 ай бұрын
Not in every institute. Nowadays, lower level stuff are studied more by electrical engineers than computer scientists. Also, people forget.
@chupasaurus9 ай бұрын
IIRC virtual memory explanation isn't a part of courses required for BSc in CS🙃
@monsterhunter4459 ай бұрын
@@bbourbakiyou don't need to be a low level guy unless embedded and even then there is still abstraction
@sheikhshakilakhtar18659 ай бұрын
@@bbourbaki Do you know what fixed parameter tractability is?
@emilemil12 ай бұрын
As for the legacy downside of the x86 instruction set, isn't there a cost there in terms of how many bytes long instructions have to be? I'd naively assume that you'd want the most frequently used instructions to use the fewest amount of bytes so there's less data that need to be fetched from memory during execution, but you can't exactly rearrange your instruction encodings every time a new instruction is added since that would break legacy compatibility.
@cthutu9 ай бұрын
@ThePrimeTime. Endian is ONLY a thing when you write the value into memory. It determines which parts of the value get written first into memory. So, looking at JUST registers, AH is the top 8 bits of AX, and AL is the bottom 8 bits of AX. But.... when you write it to memory, the AL part is written first, followed by the AH part. Hope that makes sense to you.
@meneldal8 ай бұрын
Outside of some very weird cases, memory uses a pretty large bus for access, typically larger than the bigger numbers you'd be manipulating, so there's really no endianess there, it is all written at the same time.
@cthutu8 ай бұрын
@@meneldal That''s not true. It doesn't matter about the bus or register. It's purely if the addressable element in RAM is smaller than the value you want to write to RAM, which order are the elements. Most significant or least significant. For example, if you write a 32-bit value to an 8-bit addressable RAM, do you store the high byte of the 4 first, or last. That is endianness. Nothing to do with busses.
@meneldal8 ай бұрын
@@cthutu What actually happens on the memory is hidden from you. The memory controller will typically give you a wide bus, and unless you go out of your way to ruin performance and send byte-sized writes, it will take the whole word (or usually even multiple words at once) then actually send data on the cells. I'm just not following what you mean when you say first or last, you're writing the whole word at the same time, even if you could do smaller accesses.
@cthutu8 ай бұрын
@@meneldal We're not talking about the same things. If you have a pointer, say, to address 42, and you want to write a 32-bit value. One byte goes to address 42, another at 43, and so on to 45. Which part of the 32-bit value that goes to address 42 is the endianess. In big endian, the MSB of the 32-bit value will be written to address 42 to the LSB at 45. And the reverse is true for little endian.
@n00blamer6 күн бұрын
@@cthutu Right. One is talking about spatial order and other one about temporal order.
@CrassSpektakel8 ай бұрын
Today there are no RISC or CISC CPUs anymore outside the lowest performance class. All Middle- to Upper Class CPUs are --- Microcode. Which usually means "Peep-Hole-Translating Plattform Code into a VLIW internal Microcode inside the CPU". For the Centaur C6 you could actually use x86 and MIPS code by simply exchanging the Translation Layer inside the CPU which itself was Microcode. Once people suggested (Itanium???) to simply use the internal VLIW code outside the CPU without using the layer of Plattform code but VLIW code is HUGE. A simple "increase register A1" can easily use 256Bit. It doesn't make sense, uses tons of RAM and Cache and Bandwidth on the bus. So if you have to use a compact intermediate code... why not use the one which is already widely used anyway? Therefore it basically doesn't matter if your CPU is ARM, RISC-V or AMD64.
@darlokt519 ай бұрын
Great Talk! The article is kinda brain dead to be true. The Chips and Cheese article way better captures it, and thanks to ARM lighting a fire under x86, x86S and AVX10 are coming. A bit to the RISC CISC architecture part, in general architectures are converging, the idea of RISC and CISC from a hardware perspective is truly stupid. RISC has turned a lot CISC and CISC learned from RISC. For the AI folks, you can see an ISA as tokenization for your chips. The frontend and backend are nowadays mostly decoupled and a CISC-like architecture is generally better for branch prediction and prefetching as such ARM has become with every version more CISC and x86 now has to remove their legacy 8,16, etc support to make the decoder fresh and new again, which is coming.
@dan_loup8 ай бұрын
If you kill x86, computers will turn into the same stupid thing that is the android. Imagine, having to crack your computer like a console just to run linux on it, or even a "clean" version of Windows. It's a quite horrifying future.
@Bokto18 ай бұрын
This is a valid concern, but on the positive side, we survived the UEFI transition, and I'd argue it mattered more to the IBM PC ecosystem than the ISA
@dan_loup8 ай бұрын
@@Bokto1 It's about the complete package indeed. And UEFI was probably only "survivable" because it had to compete with the ol good MBR in terms of not being a locked down hell.
@dan_loup8 ай бұрын
@truegemuese It would be a bit harder to come up with a system that allows it to be locked down to the OS and still allow you to change the hardware, but it's sadly not impossible.
@alvarohigino8 ай бұрын
But we already have arm single board computers and it's that way.
@Exilum9 ай бұрын
My main reason for not reading much by myself is simply curation. I know I can't curate articles better than Prime's community. Also, I might have been programming for 14 years, but I know very well the flaws in my knowledge and experience, so I like having someone else I know the opinions of give their take. I won't always agree with Prime, but I know what Prime's opinions are, so I can know when my analysis is flawed simply by comparing opinions.
@colinmaharaj508 ай бұрын
I was there live. I started my tech program in 1988 with the 8085. I got a job as a Telecoms tech, but they had a 8086 XT and I bought a 80286 and started gw-basic / turbo pascal then turbo C then C++. I am still with C++ 30 years later. I also did assembler. The telecoms section went bust but only long after I moved to I.T., I wish I were still in the telecoms side doing PABX stuff. I was the first non-manager / non executive staff to get a PC and Laptop in TSTT.
@yarmgl16139 ай бұрын
x86 has no reason to die, performance per watt is still very good and most software is compiled for it already. What needs to happen is the license for x86_64 to be available to any company wanting to implement it
@C.I.G.A.N8 ай бұрын
it shoul not
@spicynoodle74199 ай бұрын
x86 is 40 years of tech debt
@darekmistrz43649 ай бұрын
But also 40 years of performance optimizations
@spicynoodle74199 ай бұрын
@@darekmistrz4364 why is it getting destroyed by ARM tho? It's more bloat than optimization
@darekmistrz43649 ай бұрын
@@spicynoodle7419 Oh because it's so easy to compare x86 and ARM that x86 is getting destroyed? I didn't know I'm chatting with omnipotent being
@boptillyouflop9 ай бұрын
As opposed to RISC which would have 30 years of tech debt by now.
@spicynoodle74199 ай бұрын
@@darekmistrz4364 you are chatting with a being that has written x86 and ARM assembly. Yeah, ARM is superior in every way. I wish RISC-V was chosen as the new architecture though
@revenevan119 ай бұрын
Thank you for covering this and for having Kasey on! I'm fascinated by both the history and technical minutiae of processors, too many people take the fact that we've "tricked rocks into thinking using electricity" for granted when there's such complexity within the hardware itself. In particular, I'm fascinated by the era during which ARM was designed, because they had more affordable and compact personal computers on which they could design the chips of the future. The processor industry bootstrapped itself on a macroscopic scale!
@etgaming6063Ай бұрын
LMAO, I love the guy in chat: "SHUT UP! CASEY IS TEACHING!"😂
@davidspagnolo48709 ай бұрын
Yeah, and we all need to use metric too.
@supdawg78119 ай бұрын
Lil Endian is what they know me as in the streets
@TheOriginalSnial7 ай бұрын
Enjoying this, but at about 6:00, the 8086's 8-bit regs are al, ah, bl, bh, cl, ch, dl, dh, because it was possible to individually reference both the lower and upper bytes of the first 4 regs: ax, bx, cx and dx.
@CountSessine4 ай бұрын
I remember ages ago, in the waning days of our computer science program, explaining to my friend Dom all of the details of the x86 opcode map (I had a lot of experience with x86 assembly language at that point). He was just gob smacked - he couldn't believe how inelegant it was and that it was the dominant architecture. It was a lot of fun explaining it to him.
@benjaminkemper58768 ай бұрын
9:38 when you said 'ah ok I'm too stupid to understand that' and immediately moved on without seeking further clarification... that was thought for thought the exact same process in my mind lol, I felt that.
@Momi_V8 ай бұрын
The way I like to think about it is that essentially all modern CPUs are JIT-Compiling Assembly to their own internal μOps. There are lots of reasons for that and half a century worth of optimizations, but it boils down to essentially having a stable API to the CPUs functions and hiding the implementation details so you can change and improve them (like adding another ALU or completely overhauling your implementation like Bulldozer vs Zen) without having to change the ISA for people to start benefitting from it. The other reason are real-time adaptive optimizations like branch predictions, out of order, prefetch etc. that need a strong front end anyways which (in essentially all relevant implementations) completely overshadows the actual decode effort (Chips and Cheese estimated something in the 1%-5% of core power range for decode). And all of that is true for high performance ARM, RISC-V, etc. as well. The last time someone tried to do it without those optimizations was Intel Itanium and that went horribly... That's not to say other ISAs don't have advantages: ARM can be stripped down enough to perform ok ish on an in order core like A-53 if you really only have 0.5W of power, RISC-V can even be implemented on a microcontroller without it being any sort of "special version". Having an open license makes getting started (and education) easier, less legacy cruft makes implementing the MVP and testing everything less of a hassle (Intel is planning to drop a lot of that with x64s) and some ISA level decisions are more elegant than others (Scalable Vector Extensions vs the AVX512 mess for example). But it's not like x64 is completely doomed. AMD's Processors are performing decently, even compared to Apple Silicon, especially considering the process node differences, target market (Zen scales up to hundreds of Cores, TB of RAM and a LOT of PCIe I/O) and OS level influences (Windows vs MacOS). Intel is in a hole right now, but that does not mean x64 needs to die.
@Bokto18 ай бұрын
Itanic was not the last time VLIW has been tried. Cell happened. Elbrus tries. You can still point at them and laugh.
@Momi_V8 ай бұрын
@@Bokto1 Cell was rather purpose build so I didn't really think of it as an x86 competitor, but I was not aware of Elbrus internals. There goes another evening... But hey maybe someday there will be some kind of breakthrough. It's always good if at least someone is trying out something different.
@cristian505fr7 ай бұрын
Hi im a CS expect and this is my opinion. I think what the people said here is bad bacause bad things are bad. Good thing they said is that computing is the future of AI and i totally agree with that as a CS expert. Good things are good so i consider them good.
@TimSavage-drummer8 ай бұрын
I started mucking about with the 6502 MPU purely to get a better understanding of how CPU's work. Much yack shaving later I've have an emulator of the MPU (and basic system) writen in Zig, the biggest chunk of code is the table that maps instructions into micro-ops. Each operation takes a certain number of clock cycles (eg a clock cycle per mico-op). Did I need to do this? maybe not but I have a much better understanding of the details of a CPU from the instruction decoding, micro ops, clock cycles and timing of memory.
@ArbitraryConstant7 ай бұрын
56:10 I think the decoding issue is largely a question of x86 not being able to know instruction alignments without decoding every instruction in order. x86 does have alignment predictors to try to deal with this but they're not perfect. x86 also has to deal with corner cases like instructions spanning cache lines, and security issues like very long instructions potentially having a different instruction bit pattern embedded inside them, which can be used for what's called a return-to-libc attack. Conversely with an architecture like ARM, especially 64-bit ARM, every instruction is always 4 bytes and they are always aligned to a 4 byte boundary, so you can do as many in parallel as you want. This is one reason Apple has been leading Intel in decode width in recent years, and higher decode width helps with better power efficiency because you can run at lower clock speed and power goes with clock speed cubed.
@colton24328 ай бұрын
I feel like the articles argument at 54:35 is that if you have a smaller set of instructions it is easier to optimize the out of order processing that is done by the scheduler. However as Casey mentioned right before hand, these schedulers need to be incredibly performant and therefore the algorithms used to identify dependencies are no where nearly as complicated as the author feels they should be.
@TheDoomerBlox4 ай бұрын
28:23 shoutouts to the bot br0xamson posting "L take" in stalwart protest against the crowd a modern hero
@NaishoTheNeko8 ай бұрын
51:16 I believe a die comes from engravers. Where a person makes the negative by removing material to get the reverse image. From what I can tell, die and dye were originally spelled the same. Perhaps it was dye originally meaning what you do with it is dye a paper. Eventually the die changed in name and came to refer to the item itself.
@teropiispala25768 ай бұрын
I have been programming since middle of 80's. I have lots of assembler experience from x86, as well as most small computer cpu's and later on embedded systems and signal processors. I have also made bare metal protected mode code for early 386 processors. In a past few decades, I have mainly used C/C++ and done my work mostly on Arm platform. I'm still not big fan of Arm architecture, while practical implementations have had places on automotive systems I work with. It has always been a struggle with performance if you really need to do something heavy, and despite some challenges, there's been need to move on x86 architecture. For my opinion, Arm is remanent from short period where simple instruction set was key to higher clock frequency and performance, but those times are over. Nowadays it's all about parallel execution and efficient use of resources. Everything tends to draw power as long as the unit is powered. Silicon area is quite cheap and parallel cores are common way to increase performance. It's not good for efficient use of resources and definitely don't offer best peak performance. Too bad we are moving away from compiled languages, because there would be lots to be done in parallel execution of normal code. Heavily pipelined architectures already do it in some level, but only trying to solve runtime what the code is trying to do. Compilers could work closer to microcode level and allocate computation units into parallel paths in original code. Big part could be automatic and rest being controlled like OpenMp, but without overhead of splitting execution into different physical cores. Single core could have much more arithmetic and logical units, and some of them could be shared with neighboring cores.
@snakeplissken1118 ай бұрын
I know this is approached from a different angle. But let me approach the topic from a consumer one: If x86 needs to die, what about current GPU technology? They've been arguing that x86 was dead since thirty years. Yet, progress not only still exists. It also still results in better tech for roughly stable prices. For the same money that could buy you an Athlon 64 CPU 20 years ago, you can still get a current gen Ryzen. Meanwhile, GPUs have "evolved" to the degree that you pay the price of a small Xbox for the entry level tier -- just a decade after even 100 Dollar GPUs were viable, and the midrange started at ~150 bucks. "Sinking prices are a story of the past" -- it doesn't get more "dead end" than this.
@romanstingler4358 ай бұрын
The term "die" for a silicon chip originates from the process of manufacturing these chips. Here's the connection: Starting Material: Integrated circuits are typically fabricated on wafers, which are thin slices of electronic-grade silicon (EGS). Circuit Formation: The desired circuit patterns are formed on the wafer surface using various techniques like photolithography and etching. This creates a network of tiny transistors, wires, and other components that make up the integrated circuit. Dicing: Once the circuits are formed on the wafer, it's cut into individual pieces, each containing a single copy of the circuit. These individual pieces are called dice (plural: dice or dies). The term "die" comes from the process of dicing, which is similar to how dice are cut from a larger block of material. The wafer is essentially diced into smaller functional units, hence the name "die." Generated by Gemini