X86 Needs To Die

Рет қаралды 546,262

Күн бұрын

Пікірлер: 2 000

@jonathangraham5179 9 ай бұрын

Professor of computer science here. Nice work. I loved Casey's exposition. I think Casey is being too conservative in his criticism. The idea that fewer instructions is better is an argument from 1980s RISC proponents -- excusable in 1980 but today we know that's simply not true. If fewer instructions were always better, we would have observed RISC architectures -- the ones that people actually used back in the day -- such as the one used on the PowerPC stay largely static. That never happened, the instruction sets and number of transistors increased on the PPC from generation to generation. The original author also misses the fact that even if we are sacrificing die space to implement instructions you can't just consider the consumption of die space "bad". Implementing something on die can result in a huge performance increase. When x86 ISA was extended to add AES operations (first...second? generation i7's?) The result was a 10x improvement in performance. Given the massive use of AES, who in their right mind would consider that a poor use of die space. Also, while I don't know for certain the etymology of "die" in cpu developent. I suspect it's attributable to the use of the term in machining. Where it can be used for any purpose made tool. i.e., A letter from a type case in a old-fashioned printing press would be called a "die". Some of those were eventually made using photo-lithographic processes.

@litium1337 9 ай бұрын

Die is just inherited from the manufacturing process of chopping something into many smaller pieces or "dicing", a large piece of silicon get diced, so the smaller pieces of silicon become die, there is no big mystery. Same as dicing an onion.

@excitedbox5705 9 ай бұрын

I think it is called die, because you dice a wafer into many die, not because of the printing "pattern" die.

@shableep 9 ай бұрын

I think the “problem” is more complex these days than raw performance. When it comes to making portable work stations, raw performance is a factor, but it is not the only factor. I think ARM is winning because a balance between performance and performance per watt. The smaller instruction set allows more efficiency gains, and thanks to modern 21st century programming toolchains and compilers, the disadvantages of a smaller instruction set is not nearly as much of a cost.

@hctiBelttiL 9 ай бұрын

@@excitedbox5705 Or maybe because when you print the dies you are casting a metaphorical die (old English singular for dice) that determines how many individual units (cores) of that die are error-free at the targeted frequency? I'm probably overthinking it.

@adrian_b42 9 ай бұрын

Having thousands of rarely used instructions on die, multiplied by the number of cores (dozens today) can be wasteful. There are good uses like your AES example and bad use cases. The rare instructions can be avoided by the compilers, replaced with decent equivalents. Decoding consumes transistors and power, termal throttling is a fact. Where is die coming from? Remember the times when circuits were printed on a board with the traces covered with paint, then the board is treated with iron chloride FeCl3 to remove non-covered copper (etching)? Now it is done with EUV light, but it started with die 50-60 years ago.

@raidensama1511 9 ай бұрын

@ThePrimeTime this was S-tier material! Please have Casey back.

@saturdaysequalsyouth 9 ай бұрын

What are the tiers?

@follantic 9 ай бұрын

SABCDEF

@CaseJams 9 ай бұрын

True professional

@chickenonaraft508 9 ай бұрын

I second this

@nullpointer1284 8 ай бұрын

This!!

@ketchrahalvard8134 9 ай бұрын

As a chip designer I would like to point out that when any article like this comes up about dropping x86, what they really mean is dropping the x87 floating point extensions (the one that is a stack architecture and runs in 80 bit precision mode), The is specifically what the new Intel spec is aimed at killing. For those of you interested in why just think about how you would do register renaming when your register numbers are all stack based.

@OpenGL4ever 9 ай бұрын

Then I have a question. I've also heard that Intel would like to throw out some things. But if a CPU has, say, 8 cores, would it be possible to just throw those things away at 7 cores and keep them at just one core? Especially the old stuff that was used in the DOS era was never written for multicore CPUs anyway. This old software would therefore only need this one full-fledged core.

@stevesether 9 ай бұрын

@@OpenGL4ever I had a similar thought. If you suddenly take away the x87 FP stack, what software is suddenly going to break without being re-compiled? It might make a lot more sense to just de-emphasize these old instructions, and make them work, but not as performant.

@asm_nop 8 ай бұрын

@@stevesether I don't know what Intel's proposed solution is, but I imagine they have a way to hook those instructions at execution time and deal with them. Since they're very old instructions, they have the benefit of only occurring in very old code. Sure, you could use a ridiculously complex decoder to convert them, but you could also do something crazy like raise a flag to the operating system and flag the code to be decompiled and rebuilt into equivalent compatible instructions by an OS process, and link it back into the original executable. The first run might be slow, but the second time would be real fast.

@Folsomdsf2 8 ай бұрын

yah, unfortunately the article author and even the commentators have reasons to not really be.. honest so to speak.

@giornikitop5373 8 ай бұрын

makes sense, these are very old and i don't believe they ever been used, since the pentium era, if ever. the x86 compatibility goes a long way but i guess it's on the safe side, mmx/sse were being used instead. there are also some other legacy stuff that can be removed safely. as for the renaming, isn't intel already using, in lack of a better term, indexed locations for the registers? maybe you can shed some light here because i really don't understand exactly what they do, if that holds any truth.

@Maxible 9 ай бұрын

This video was exceptional! Loved diving into the weeds. Also, kudos to your guest for having that board setup. Super helpful and so awesome!

@admiral_hoshi3298 9 ай бұрын

TLDR: More complex does not mean slower.

@remboldt03 9 ай бұрын

Yeah, pretty much. The most complex soloutions had to be found to make stuff faster

@julkiewitz 9 ай бұрын

If anything it's TLDR: The article is wrong cause the author doesn't know what they are talking about.

@bits360wastaken 9 ай бұрын

@@julkiewitz Did you read the article? It was about how ancient rarely used instructions, and the sheer bulk of instructions take up valuable space and increase complexity. The only times "fast" was mentioned was them saying speed was their only priority.

@henry_tsai 9 ай бұрын

@@bits360wastakenBut modern chip designers also know that, and they only allocated the bare minimum amount of die space and power to those obsolete instructions. Deleting those instructions won't yield any visible change.

@LiveErrors 9 ай бұрын

I think amds 3d VRAM shows that

@brainforest88 9 ай бұрын

I an ancient too. Programming professionally since 1988 :D

@joshuatye1027 9 ай бұрын

Congrats

@nezbrun872 9 ай бұрын

My first paid for programming job was in 1979, but I wrote my first program in Algol on paper tape in 1976. Really like this guy because he speaks my language, and calls out the downsides and very real practical impact of today's fashionable sacred cow practices.

@veritypickle8471 9 ай бұрын

ty for your service

@huso7796 9 ай бұрын

Oh cool, like honestly. What were you programming? How was the debugging process without fancy IDEs? What was it like to describe your job to other people? Could you elaborate more if you don't mind how different it was compared to today's way of software development?

@cylian91 9 ай бұрын

as old as turbo C ! (anyone know a good decompiler for dos ?)

@tenisviejos 9 ай бұрын

You know a person is really smart when they can break down complex concepts to other people. The pipeline explanation was *chef's kiss*

@XDarkGreyX 9 ай бұрын

Chat went on about the transfer between the machines, which I noticed too but... should he have addressed that when it comes to the hardware pipeline?

@proceduralism376 9 ай бұрын

@@XDarkGreyX The transfer would basically be instant, it's just a bunch of clocked latches that separate each stage

@ApplesOfEpicness 8 ай бұрын

The laundry machine analogy is like the standard go-to for explaining pipelining. The buffet analogy also works.

@BrunodeSouzaLino 8 ай бұрын

Learning how to teach is a skill which has nothing to do with the knowledge you're teaching.

@TheVoiceofTheProphetElizer 8 ай бұрын

@@BrunodeSouzaLino I feel as if millions of tenured researchers with teaching loads cried out, then were suddenly silenced. Perfect way to sum it up. If only the vast majority of people that taught realized it was so much more than verbally repeating something to a room full of 20 somethings.

@mansquatch2260 8 ай бұрын

I looked it up on wikipedia. It's called a die, because: " Typically, integrated circuits are produced in large batches on a single wafer of electronic-grade silicon (EGS) or other semiconductor (such as GaAs) through processes such as photolithography. The wafer is cut (diced) into many pieces, each containing one copy of the circuit. Each of these pieces is called a die."

@xponen 7 ай бұрын

paraphased from Chatgpt when queried "semiconductor die etymology": The word "die" in this context comes from the Middle English "die," which referred to a small cube or object. In manufacturing, a "die" is a specialized tool used in industries like metal stamping, and semiconductor fabrication to cut material.

@mansquatch2260 7 ай бұрын

@@xponen Seems chat GPT confused three different uses of the word and morphed them together.

@FriesOfTheDead 7 ай бұрын

@@mansquatch2260 You basically said "It's called a die, because: It's called a die", well done, slow clap.

@mansquatch2260 7 ай бұрын

@@FriesOfTheDead it's called a "die" because it's one of many diced up things.

@EyebrowsMahoney 6 ай бұрын

@@FriesOfTheDead You think you're smarter than you actually are.

@timseguine2 9 ай бұрын

One thing I think the author of the article doesn't seem to get, is that if you follow the arguments they actually make to their logical conclusions, you don't end up with RISC-V or ARM, you'd more likely end up reinventing Itanium.

@mrbigberd 8 ай бұрын

You can't go that far because the halting problem gets in the way. By the last iteration, Itanium was a bog-standard architecture that basically ignored the whole VLIW aspect entirely.

@SimonBuchanNz 8 ай бұрын

Just encode the processor control lines directly! Pay no attention to how you actually read the instructions...

@mrbigberd 8 ай бұрын

@@SimonBuchanNz the ISAs don’t just vary in syntax, but in semantics and in guarantees. These guarantees must be fulfilled.

@acheleg 8 ай бұрын

Conroe

@hoeding 9 ай бұрын

Washer / Dryer metaphor for pipelining nailed it.

@Dongdot123 8 ай бұрын

Damn right we just understood it so easily with that explanation

@ylstorage7085 8 ай бұрын

ford's assembly line could have served better

@Aberusugi 8 ай бұрын

Yeah, I finally have a way to describe the concept to other people. Very helpful.

@markteague8889 8 ай бұрын

The fast-food drive thru is a pretty good one!

@_somerandomguyontheinternet_ 8 ай бұрын

Yup! Using that from now on!

@nowaymyname 9 ай бұрын

As someone who is currently learning x86 ASM at college right now, I feel like I've learned more by Casey in one hour than I have all semester. Please bring him back, awesome content! Full-time content creator Prime has so far not disappointed.

@OpenGL4ever 9 ай бұрын

You might also search for "The Intel 80386, part 1: Introduction Raymond Chen" and read part 1 to n.

@RetroPcCupboard 8 ай бұрын

They actually make you study x86 ASM at college these days? Back when I was at university they did cover ASM for a simple microprocessor (I forget which). But not the x86 architecture. That was in the late 1990s. ASM is irrelevant for most software developers these days. Compilers will typically produce more optimised machine code than you can do manually with ASM. Unless you really know what you are doing and there is a specific case that the compiler does badly at (or can't do). ASM is useful to teach you the inner workings of a CPU though.

@OpenGL4ever 8 ай бұрын

@@RetroPcCupboard Knowing assembler helps you understand what the compiler produces and how high-level languages work. I think this is very valuable knowledge, it's like Latin for languages.

@RetroPcCupboard 8 ай бұрын

@OpenGL4ever Sure. If you are interested in that. I think most developers these days don't really care how the compiler works or even what the inner workings of a CPU are. I actually find it fascinating. Despite the fact I have been a software dev for 25 years I am only now learning x86 assembly. I have an old Pentium MMX PC that I am using for the purpose. I realise that I could do it on a Modern PC. But I feel that a slower PC makes more sense for seeing the impact of ASM vs older compilers.

@Pootie_Tang 8 ай бұрын

@@RetroPcCupboard man, someone study computer engineering, how can we not study x86 asm if we study how to develop said processors? =)

@cubbucca 9 ай бұрын

just got talked out of buying a Washer Dryer Combo

@XDarkGreyX 9 ай бұрын

@@squishy-tomato is that a single or someone in a two-person houshold talking?

@Cadaverine1990 9 ай бұрын

@@XDarkGreyXtwo person? Have kids...

@cadekachelmeier7251 9 ай бұрын

They're pretty great since you don't have to bother moving the clothes half way through. So you can throw a load in before bed or whenever. The main thing is that the drum for a dryer is about twice as big as a washer for a given capacity. So you can easily add too many clothes for it to dry well.

@michaelb4727 9 ай бұрын

If you buy two combo machines, it takes up the same amount of space as a standalone washer and standalone dryer, but it's actually faster because you don't have to wait for the last drying cycle in the pipeline, and also you usually get bottlenecked by the drying time on most commercial machines, so you would effectively have two dryers. (Also, you can be more lazy and leave more clothes in the dryer overnight.)

@PennsyltuckyPhil 8 ай бұрын

In the scheme of uOPs I would have two dryers with different capabilities. I have a small spin dryer as well as the conventional dryer, washer occasionally refuses to run spin necessitating a branch to move the items into spin dryer before normal move to conventional dryer.

@ristekostadinov2820 9 ай бұрын

The person who wrote the article about risc-v taking piece of x86 forgets to mention that no company who makes risc-v processors use it vanilla, si-five not only design soc architecture they develop extensions (more complex instructions) to solve the problems they need. Maybe tiny microcontrollers use vanilla risc-v that's what makes them cost 20 cents (besides free isa), but for high performance computing they does similar stuff ARM/x86.

@nicostein9875 9 ай бұрын

Having a good teacher successfully explain something novel to you feels like watching a magician.

@TheHighborn 5 ай бұрын

My university teachers wish they were as good as a random KZbin channel....

@brennan123 4 ай бұрын

Loved this. I used to program almost daily in assembly language (MOS 6510, x86, Hitachi H8/SH) up until about the original Pentium days. Knew the opcodes and their encodings inside and out and could write simply assembly programs with just a hex editor (no assembler). Never really did any 64bit stuff. This is fascinating how much things have changed since then. Never knew about the table showing how things can be scheduled with the different ports. Haha, makes me remember the 320x200 screen resolution and needing to do get the pixel's memory location (y*320 + x): XCHG AL, AH (swap the low 8 bits and high 8 bits, multiples by 256 effectively: y*256) MOV AX, BX SHR BX, 2 (shift right by 2 bits, divides by 4 so: y*64) ADD AX, BX (adds the y*256 + y*64 to get y*320) All this because it was like 5x faster than MUL AX, 320 Can't believe I can still do this from memory without looking anything up. lol

@pbentesio 9 ай бұрын

Casey Muratori is on a short list of people who motivate me to keep learning. It is inspiring to see people this knowledgeable about the subjects I love.

@mjthebest7294 9 ай бұрын

Him and Jon Blow are my top ones.

@dimitrioskantakouzinos8590 7 ай бұрын

Where can I find more from Casey Muratori?

@alexandertownsend5079 6 ай бұрын

The channel Low Level Learning@@dimitrioskantakouzinos8590

@alexandertownsend5079 6 ай бұрын

Thor from Pirate Software is high on my list.

@tamertamertamer4874 9 ай бұрын

I‘m dyslexic. So I have a dyslexic dude reading for me lmaooo.

@cat-.- 9 ай бұрын

I'm not dyslexic, but I'd like to think I am to justify me reading very little

@andrewdunbar828 9 ай бұрын

Man, I'm not dyslexic but I'm still the slowest reader I've ever met.

@ark_knight 9 ай бұрын

If it helps, you can think of it as "pipelining". If you were just reading it yourself, you would be reading it yourself. But since you are hearing him read it, you can go do other task. Cue multi-threading. (Or get entertainment out of it while still learning)

@XDarkGreyX 9 ай бұрын

I can read fast but just as in school I may need to read a sentence 10 times even at low speed to even just barely get it.

@tamertamertamer4874 9 ай бұрын

@@ark_knight lmaoooo true

@aliasjon8320 9 ай бұрын

Are we also going to get a "X86 doesn't need to die" with Primes face photoshoped onto Mercy from overwatch as the thumbnail

@MrHaggyy 9 ай бұрын

XD the mirrored Mercy from upcoming season would fit great.

@technomancer75 9 ай бұрын

While riding on a fake horse ;)

@XDarkGreyX 9 ай бұрын

@@technomancer75 should be cow he is riding on. He owned a cow once, or does own them still.

@miroslavhoudek7085 9 ай бұрын

I worked on this rocket launcher once, and the on-board software was compiled for both the ARM based embedded in the rocket - but also for Intel PC development workstation for easier testing. So we had a little-endian code and a big-endian code produced. And we were doing bitwise arithmetic and network transfers that had to work in both environments, all that in Ada 2012. I'm confused still until today.

@OurSpaceshipEarth 8 ай бұрын

nasa loves those coc pc on chip ecc ram [x3 mCHINES IN CVASE SUBSPACE TREKKING BIT FLIP DISAGREEEMENT]

@cloakbackground8641 9 ай бұрын

I've wondered from time to time if it'd be easier to just write μops directly and peal back the abstraction, but Casey explaining it as _compression_ made it suddenly make sense: CPUs are much more limited by data transfer than processing.

@warpspeedscp 9 ай бұрын

You'd be going back to itanium and ps3 cell processor level if you did that, haha

@kainlamond 7 ай бұрын

@@warpspeedscp hey those cell SPEs can still do some Interconnection tricks that we are just now getting to comparable speeds, i wonder what the PS3 would of been if the cell CPU has 2 PowerPC cores

@warpspeedscp 7 ай бұрын

@@kainlamond well, true...

@leeroyjenkins0 7 ай бұрын

But does the latency matter that much if you're constantly executing stuff though? You care more about throughput than anything as far as instructions go. My main takeaway was more that the CPU has a front-end to translate the instructions and a back-end to actually execute them and it makes sense to let the CPU dispatch higher-level assembly instructions as it sees fit rather than micromanaging the way the instructions are executed. Basically assembly is the abstraction layer that allows CPU designers to change the microops however they want to optimize it without software needing to be recompiled every time they change the smallest thing on the CPU, so it makes sense to have it be very descriptive so it can translate into microops in the most optimal way for the current physical layout.

@seinfan9 5 ай бұрын

As your program grows larger and more complex, you'd quickly realize that having to direct what happens with the architecture every step of the way becomes a nightmare of overhead that will have you spending much more time coding, potentially to a point that makes finishing the program a near impossible task if the goal is to speed up execution. Emulating the pipeline, whether or not you'd allow branch prediction and dealing with cache misses, handling interrupts and saving the pipeline state, stopping invalid memory accesses, the determination of allowing for different logical units to handle mathematical and Boolean processes simultaneously... Needless to say, it wouldn't be easier.

@sdwone 9 ай бұрын

So glad people like Casey are still out there fighting the Good Fight! Because the way things are going, computers and software development in general, will get so complicated that only an elite few will truly understand it all. And those elite few will have unprecedented power! So yes! I'm not saying that all developers need to get a degree in all this low level stuff... But that, the more of us that know, even roughly, how a computer actually works, the better!!!

@XDarkGreyX 9 ай бұрын

More and more people use hammers but fewer and fewer know how do build or at least understand them? Applies to countless fields, but would that be a valid metaphor?

@sdwone 9 ай бұрын

@@XDarkGreyX Yeah... That metaphor sounds totally reasonable to me! 👍🏼 Particularly in this industry.

@teaser6089 6 ай бұрын

It's a good thing. Programming shouldn't require the extend of knowledge that those who design the processors do. The democratization of programming by the development of High Level Programing Languages is the best thing that ever happened to the landscape of development. Yes high level programming languages are often less efficient than low level languages, but not everything has to be written most efficiently, with current hardware we can prioritize the ease of coding and readability over pure performance.

@kurku3725 20 күн бұрын

@@teaser6089 no... the problem is that everything becomes super mysterious, like what is a jit, what is a stack, why it has this strange limit, what is a heap, why people are talking about these things, how compilers work, what is the malloc, how it does the thing, why it is even needed and so and so forth, why segfault is the thing blah blah blah, just neverending stream of buzzwords with weird explanations which kinda make sense, but lacking something real and understandable when you can actually just spend 5-10 hours of your time, learn a little bit of C, GDB or just use visual studio and peek at memory and bytes and see what actually is happening with your eyes without these handwavy explanations and everything is going to be super clear from now on like freaking yeah, RIP, which is $pc in gdb i can tell computer: `go there execute this thing` and it will do so I can do it in debugger interactively I can edit memory myself, why not I can get a segfault, catch it, and fix the bug just writing directly into the memory the sequence of bytes my computer, my program, it does what I tell it to do go check handmade hero `intro to C section` from Casey very old streams, and he does not explain everything in super-duper details but basically after them you could probably create a simple JIT interpreter you just need to define semantics, write the parser, make the translator, spit the code, mark the page executable, patch the bytes and let it run from there rinse and repeat and that is just wow I litterally transcended during these 10 hours of watching his explanations and going through the debugger

@P-39_Airacobra Күн бұрын

@@teaser6089 Making hardware more complex doesn't make software less complex. Compilers are becoming more and more archaic and untouchable because hardware developers expect to be able to do anything and just have software developers support it.

@channel11121 9 ай бұрын

Casey was so disappointed when he didn't understand why little-endian was better, and also didn't care enough to understand it.

@andrewdunbar828 9 ай бұрын

I think it only occurs to us when we've done assembly programming, or bit-banging level C programming.

@YaroslavFedevych 9 ай бұрын

@@andrewdunbar828or speak German, numerals from 21 to 99 are little-endian there. Or when we add/subtract/multiply on paper, somehow it’s easier to do little-endian.

@andrewdunbar828 9 ай бұрын

@@YaroslavFedevych Ja ich weiss es! In English numbers are big endian and can make writing functions that do things like decimal conversion, ASCII conversion, adding thousands separators a bit unintuitive or at least more tricky. I started assembly on the Z80 which was little endian but lost the feel for it so much after moving to the big endian m68k that little endian never felt natural again.

@realmarsastro 9 ай бұрын

@@YaroslavFedevych gimme 99 of them luftballons, amirite? In Norway we actually can pronounce the numbers 21-99 in both big endian and litte endian. The little endian variant is more common with older people, it's unfortunately dying out.

@arnesl929 9 ай бұрын

Yeah, even I understood, I was a bit surprised by the lack of enthusiasm.

@Dom-zy1qy 9 ай бұрын

At the washer dryer and washer/dryer analogy, i thought he was going to say the washer dryer combo was faster because I always forget to swap the laundry to the dryer when it finishes... So it ends up taking like 2 hours extra. It was a good analogy though, I actually didnt even know about micro ops before this, but it makes a lot of sense.

@darekmistrz4364 9 ай бұрын

I also wanted to say that I forget the laundry or that you need a specific person to sit next to the washer not to forget to move it from washer to dryer

@GeorgeGzirishvili 7 ай бұрын

1:01:24: Correction: "Reduced" in the Reduced Instruction Set Computer (RISC) doesn't mean fewer instructions but simpler instructions; in other words, instructions reduced in complexity or doing fewer operations per instruction, as opposed to the Complex Instruction Set Computer (CISC).

@lesterdarke 9 ай бұрын

I would be interested for you to have this guy come on and talk more about the differences between arm vs x86. In particular what actually makes arm more energy efficient - everyone always makes it sound like its down to cisc vs risc, but this video makes it sound like this may not be the case.

@coolworx 9 ай бұрын

I'm not nearly a programmer. I'm just an arborist, who gets up at 5am and works in the orchards of the Okanagan, But don't get me wrong, I love the job I fell into. I'm a fruit tree doctor, and I've saved many a patient! But in the evenings I'm a hobbyist hack, building personal projects because it's fun. I fkking love this channel. And this episode was outstanding. Prime was asking a lot of questions I was having, and the presenter was excellent. Ohh I'm sure the information will never be put to direct use, by myself. But it's good to know how the things in your life actually work. I certainly have a better understanding of the term _microcode_

@XDarkGreyX 9 ай бұрын

Huh, really interesting that you are here. Kudos. *clap clap*

@loo_9 9 ай бұрын

most of programming are techniques only an engineer truly needs to do. but computer science is a beautiful subset of math that anyone can appreciate. as an arborist you are likely implementing the same optimization techniques that CPUs use without thinking about them, multitasking, task overlap, task reordering, pipelining, etc. it’s just a definitive way to think about the world

@coolworx 9 ай бұрын

@@loo_9 Like i said, I've been building side projects for a while. One of them is a nifty nodejs and nedb tree notes app that lets me keep track of progress and search for any key words or time intervals. Now I want to get some charts going to show trends. I have almost 10 years of data, that I've been gathering. Including daily weather conditions. So... ya. I tree surgean by day, and code at night. And the best part is, I have no deadlines, no HR directors, no tiresome teammates... ;-)

@lritzdorf 9 ай бұрын

> But it's good to know how the things in your life actually work. This. This is the reason I enjoy computing, from both a hardware and software perspective. Yes, the tech is cool, but what I truly love is understanding what the heck these magic boxes of lightning and sand actually _do._

@allmycircuits8850 8 ай бұрын

As an arborist, you're officially a HACKER :)

@QuicksilverSG 9 ай бұрын

In reality, all generations of X86 and x64 assembly language have been emulated in microcode since the advent of the Pentium P6. The underlying cpu hardware contains banks of interchangeable 32-bit and 64-bit registers, along with RISC primitive instructions that operate on them. Both X86 and x64 assembly language instructions are parsed by a cpu hardware interpreter that converts them into streams of microcode instructions that are speculatively executed in parallel by multiple internal execution units. This is the actual Intel "machine code", and it is not possible to manually program with it. Human assembly language programmers can only use hardware-interpreted X86 and x64 instructions, the underlying Intel microcode is locked inside the cpu. Decoding X86 and x64 assembly language can actually run faster than an ARM cpu executing manually programmed RISC code. That's because Intel assembly language is more compact than RISC machine code, and can thus be loaded more quickly from memory, which is often the limiting factor in code execution speed. Underneath the hood, both Intel and ARM cpu's are highly optimized RISC machines. The difference is that ARM assembly code is executed directly by the cpu, while Intel assembly code is virtualized and emulated by internal microcode.

@astrixx 9 ай бұрын

That was a pretty useful explanation.

@theexplosionist2019 9 ай бұрын

I read they use 70~ bit registers to store flags + GPRs.

@adrian_b42 9 ай бұрын

You are right. The only complaints about CISC in x86 are the variable length of the instructions that makes decoding a more complex and also the rarely used instructions carried over the decades.

@mitrus4 9 ай бұрын

Are you sure about your last point? With ARM you can start decoding instructions in advance, because all of them are 4 bytes each, so you don't need to wait for decoding to get the address of the next one. With x86, on the opposite side such a dependency of variable length doesn't allow for it. I think it can outweigh the increased pure count of simple instructions, especially if there is no code bloat and hardware instruction memory prefetching does its job well.

@QuicksilverSG 9 ай бұрын

@@mitrus4 - Program memory is loaded from RAM into processor caches, which on Intel chips are divided into interchangeable 64-byte cache lines. The cpu instruction decoder has no need to calculate the RAM address of the next instruction, it relies on the program counter to automatically keep its local instruction cache full. If that instruction cache ever underflows, the instruction decoder will have to wait until the cache is refilled from RAM. However, cpu execution units and data load/store units can continue to process previously decoded instruction micro-ops, which proceeds speculatively in parallel with instruction decoding. In practice, it is far more common for execution to stall due to logical or algorithmic dependencies than for speculative execution to outrun instruction decoding. With ARM cpu's, each machine code instruction is four bytes long. On Intel cpu's, the most common instructions take just one byte, though complex instructions can be up to fifteen bytes long. On average, Intel machine code tends to be about half the size of ARM code.

@piotrj333 9 ай бұрын

This is garbage article. First, x86 is also RISC architecture for long time. CISC instructions become RISC internally in CPU, and fact we don't use RISC directly, only costs us 5% at most 10%. That is essentially entire cost of X86. 2nd. AMD ryzen for laptops in energy efficient configurations can compete with Apple M processors, at least in work done per joule. There is some problems with idle draw, but those can be attributed to Windows and itself SoCs (like Apple has soldered RAM really close to CPU) not to architecture again. 3rd. Spectre and Meltdown affected x86, ARM and even IBM power architecture.

@abbe9641 9 ай бұрын

Yeah chuck a good mobile Ryzen processor to linux instead and the difference i heard is night and day with battery life improvements in the 10's of %

@gigitrix 9 ай бұрын

Don't disagree but that 5/10% on something that could be delegated to compilers in a post-moore's law world is more relevant than you might think

@neoqueto 9 ай бұрын

Modern ARM and x86 and even RISC-V ISAs are pretty much all nearly identical. They can all do the same things and the notion that ARM is a "less complex" architecture because "ARM is RISC and x86 is CISC" should only be made fun of. However we are at the point when we have to scrape the bottom of the barrel for miniscule gains.

@TheSulross 9 ай бұрын

Yeah, I don't want the main RAM as a permanent, unchangeable fixture of my computer the way Apple does things now. Now is true that one could produce a lower transistor count x86 if stripped out all of its legacy stuff and only implemented the instructions used by modern software and operating systems. That would be an interesting project (and am suggesting cleaving more than just the CPU coming up into 8086 real mode). After all, any modern CPU per its performance can more than adequately emulate vintage CPUs of the '70s, '80s, up to mid 90s - for those that want to play their fav retro games. Retro computer emulators on ARM - like the Pi - prove this every day. So x86 doesn't need the transistor real estate that is dedicated to supporting stuff that are vestiges from 40 years ago. Git rid of that fat to lower the power draw. And aren't there too many redundant SIMD instruction sets - why not trim that down to what is only in vogue in, say, last decade?

@neoqueto 9 ай бұрын

@@TheSulross especially with EFI being so commonplace, a lineup of "stripped down" x86 parts could maybe be viable... but then again they have hundreds of engineers wracking their brains all day long about performance and power efficiency improvements so it's not like they haven't thought of the same ideas as us smoothbrains have (and there's probably reasons they concluded they're stupid).

@reaktorleak89 8 ай бұрын

So someone bought a Macbook, loved the battery life without researching why it was so good, and then wrote a Kill x86 article?

@lostcosmos3245 12 күн бұрын

yup! pretty much like every internet journalist without any real investigation or research just spewing biased opinions instead of facts.

@carloslint9914 7 ай бұрын

2 cents: Last time Intel tried to break compatibility with x86 was the Itanium, which is now a footnote on modern CPU's history. Most people never heard of it.

@Cmanorange 9 ай бұрын

39:35 casual magic

@Nirsi 9 ай бұрын

"I'm ancient" yeah sure "I was professionally programing since 1995" well you program longer than I'm alive, you earned that title

@darekmistrz4364 9 ай бұрын

He looks like he was born in 1988 at most. He must have been programming at the age of 7! No wonder he is a genius

@PennsyltuckyPhil 8 ай бұрын

I thought the definition of ancient was knowing of EBCDIC and having an IBM 370 gold card within arm's reach.

@TheVoiceofTheProphetElizer 8 ай бұрын

Unless you're solving math problems using vacuum tubes, I'm not sure what exactly you all are defining "ancient" as.

@PennsyltuckyPhil 8 ай бұрын

@@TheVoiceofTheProphetElizer Well I guess we could go back to where a program's bug is a moth caught in a relay ...

@JacobBogers 8 ай бұрын

I programmed assembly since 1983 (was a 14 yo kid back then)

@bzboii 9 ай бұрын

(optimizing compiled language argument) people write for the llvm using their language and the llvm writes itself for the “API” of the cpu which is the instruction set like x86-64 and the instruction set writes for the microcode and the cpu can only “do” certain actual things at the end of the day optimizations at every layer, it’s turtles all the way down until the actual cpu architecture

@darekmistrz4364 9 ай бұрын

And then you need to remember that all that computer is doing is juggling memory: from disk to ram, from ram to L3 cache, from L3 to L2, from L2 to L1, from L1 to CPU and all the way back to disk.

@theondono 9 ай бұрын

The guys at the Oxide Computer podcast (ep 1) covered the reason for Little Endian with Jeff Rothschild, and apparently the reason for that choice had to do with earlier cpus having 1-bit length ALUs. This meant that you had to “stream” the bytes into the ALU, and little endian helped simplify the carry logic.

@warpspeedscp 9 ай бұрын

Its nice that LE still ends up being useful when things are scaled up

@nahkh1 9 ай бұрын

For those who are curious, the tools used for cutting a specific shape out of a material is called a die: en.m.wikipedia.org/wiki/Die_(manufacturing) I'm assuming the cut pieces (e.g. CPUs) took on the name of the tool used to cut them over time. I'm also pretty sure that they don't use literal dies anymore to cut out the individual chips from the silicon wafer.

@blarghblargh 8 ай бұрын

didn't find anything that's authoritative on this, but I did find a few results referring to the process of cutting up a larger item into square pieces being "dicing". the results of that are "dice". an individual item is a "die", and found results saying that was the etymology for integrated circuit dies. CPUs aren't stamped/casted/extruded, and never have been, which is why I am not sure the linked wikipedia die concept applies. But I can't fully argue either way. What I can agree with is it is related to the manufacturing process. We can be pretty sure of that.

@nahkh1 8 ай бұрын

@@blarghblargh the specific use of die I mean from the article is a stamping die. It's basically a matching set of "knives" with a complex geometry. The dies are pressed together with the material in between, and while I doubt that's how silicon chips are cut these days I can believe that's how it would've been done in earlier days.

@autarchex 8 ай бұрын

@@nahkh1 Integrated circuits are batch manufactured in a grid pattern on a wafer of (most commonly) pure crystalline silicon. The individual product pieces are separated from the wafer ("singulated" or "diced") using a saw, which converts the wafer into a large number of identical small parts collectively called "dies" or sometimes "dice" (older usage) and one of these is called a "die," and this term in a semiconductor industry context always refers to the product and not the tool. There are a few other rarely used singulation techniques other than a saw, like laser cutters, waterjet cutters, even particle beams that can cleave the wafer by precisely imparting millions of crystal defects in a line. As far as I know we never used stamping die-cut (where "die" means tool, not the product) techniques though; silicon crystal is quite hard and brittle, and the wafers chip and shatter more resembling thin discs of glass than thin discs of metal. Nonetheless, I'm sure the terms share a common history. There are other contexts too, where die and dice refer to the product and not the tool that made them - for example, playing dice.

@rustyshaklford9557 8 ай бұрын

As someone who works in industrial stamping, I can confirm that dies (in combination with punches) do indeed cut things.

@haraldfielker4635 9 ай бұрын

20 years ago - same talk :) The solution in ~2000 was "Intel Itanium" 😂😂

@stepank1 9 ай бұрын

Yeah people blaming intel / AMD for "bloated" x86 and "tech debt" as if this decision was not made solely to please the customers is pretty incorrect

@AlecThilenius 9 ай бұрын

I had this thought too. Intel tried to fix these issues in Itanium. It's now nicknamed i-tanic because no one wanted to recompile their code to run on Itanium. I'm not at all a fan of Intel, having worked there for 2 years back in college many moons ago, but you can't soley blame them for x86 legacy.

@stevesether 9 ай бұрын

@@AlecThilenius as I recall, Itanium never achieved the supposed performance gains, even if you DID recompile your code. In the Linux world, Debian alone supports 4 different CPUs. x86, MIPS, Arm, and PowerPC. Re-compiling isn't really a problem. I believe Itanium was supported. The servers themselves were freaking EXPENSIVE. I've honestly never seen one, used on, or worked anywhere that had one.

@timothygibney159 8 ай бұрын

@@stevesetheryou underestimate how much technical debt for legacy dos and Windows 95 software run on modern business. No compiling Deb files for PowerPC or arm won’t help run accountings old Oracle macros written vb 5 back in 1998 in Excel only.

@methanbreather 8 ай бұрын

@@AlecThilenius it wasn't just recompiling. Itanium is just a slow architecture. Every architecture that relied on the compiler do do the work for the cpu failed.

@Dungeonseeker1uk 9 ай бұрын

Fun Fact: Intel used Pentium over 586 in a stupid attempt to stop the clone market. They couldn't trademark a number so they used Pentium & added the MMX extension which was also trademarked. OFC AMD and VIA just used 586, they didn't get MMX till much later when it was basically redundant, AMD created 3D Now as a response when they adopted the K6 moniker (K6 was 686).

@kahnzo 9 ай бұрын

I had completely forgotten that fact, but that's right!

@craigpeacock1903 9 ай бұрын

Ah, the K6... the first cpu I bought myself was the K6-2 600...

@betag24cn 9 ай бұрын

afik, amd still has those instructionsets from k6 era

@OpenGL4ever 9 ай бұрын

@@betag24cn No, 3dnow! is obsolete and got removed from newer AMD CPUs. There was an agreement about the SSE family of instruction sets, thus 3dnow! was no more needed and it was incompatible with SSE.

@betag24cn 8 ай бұрын

@@OpenGL4ever now that youmention it, i have not checjed that in many years, probably you are right

@TurtleKwitty 9 ай бұрын

I forgot those transparent board exist for a sec so when he started writing in the air I was so confused and amazed XD

@jewlouds 9 ай бұрын

I was more impressed he 2as writing backwards

@tehwibe 9 ай бұрын

@@jewlouds Nah, the camera is flipped horizontally

@XDarkGreyX 9 ай бұрын

The scroll-up got me

@scito1541 7 ай бұрын

aka glass?

@TurtleKwitty 7 ай бұрын

@@scito1541 Way too brittle, it's plexiglass if some kind and they tend to have lights if memory serves so the writing can slightly glow with neon colors

@BilalHeuser1 8 ай бұрын

When I started to write assembly languages for my TRS-80 which was Z80 based, I also learned about the Intel 8080, which is what the 8088/8086 CPUs were based on. Much of the register structure was duplicated on the Z80, but Zilog added extended instructions. Knowing that makes understanding the 8086 and subsequent generations that much easier.

@freyja5800 9 ай бұрын

One thing about the conclusion that stood out to me is that the article makes out of order, speculative execution & superscalar seem like they are bad things, while they are things you actively want in your compute hardware. Like, to make things faster you can either a) do the same stuff, but faster, or b) do more stuff, but in the same time (i.e. faster clock speeds vs. parallel computation). and while there definitely are situations where the first is the only way (dependent operations etc), if you have the choice the second is always more preferable, because the energy required for running it at a lower speed but in parallel are much more favourable, to the point where it is useful to have dedicated low-speed hardware for certain tasks (accelerators, in particular gpu's are a prime example of that)

@leeroyjenkins0 7 ай бұрын

I don't think they are painted in a bad light and I think the presenter here was just going in with the preconceived idea that the article is wrong, can make no valid point and that everything they write is a criticism of x86. The article describe the fact that speculative execution is a thing that is done on modern CPUs, and then they try to argue why having a more complex instruction set makes these things more difficult to do. Namely because it's difficult to figure out what instructions depends on which other instructions when they're so complex. So having complex instructions makes out-of-order execution slower (because there are more checks to do between two given instructions?), seems like a valid point to want to make. I don't know that it's true, but I also will never know because instead of discussing that we're arguing that the writer of the article knows nothing about anything.

@boptillyouflop 3 ай бұрын

@@leeroyjenkins0 IRL most of the badness of x86 is determining instruction length (which usually gets cached nowadays) and 286 protected mode stuff (the mode in which Windows 3.1 worked). Once you split instructions that both load and do math (ex: add eax, [ebx + ecx*4] -> mov temp, [ebx + ecx*4], add eax, temp), basically every instruction reads 2 registers, writes 1 register and maybe updates flags. Which is why they didn't have *that* much trouble making x86 Out-Of-Order CPUs (PPro came out in 1995, one year after the PowerPC 603 but the same year as MIPS R10000 and PA-RISC PA-8000 and SPARC64).

@jaysistar2711 9 ай бұрын

Von Neuman architecture basically means that code and data can be in the same address space. Harvard architecture, like an 8051, has code and data in 2 seperate address spaces.

@thewhitefalcon8539 9 ай бұрын

And Modified Von Neumann means they have separate caches.

@catcatcatcatcatcatcatcatcatca 9 ай бұрын

To me this was taught with emphasis on different busses: instruction bus and data bus can be accessed simutaniously, and can have different widths. For example 14-bit PIC controllers have 14-bit instruction bus, but 8-bit databus. Any modern microcontroller needs an ability to write instructions while running and execute them later: I don’t want to rewrite the Rom of my laptop everytime I install something. And the kernel needs some way to give userspace code access to the CPU (the alternative would be virtualisation, I think). Either way the relevant consepts of privilege are so far removed from the hardware that the distinction between Von Neuman and Harvard architectures isn’t as meaningful as their orginal definitions.

@ern0plus4 9 ай бұрын

The idea of von Neumann's idea is not about address spaces, but about the representation of the program: if it's represented as data, we need no extra mechanism to handle it (load, save, execute). The executor unit (CPU) already should access memory, loading and storing data from/to, so if we represent program as data, it's not a big deal, we can use common mechanisms for it. I don't know how computers looked before this idea, AFAIK, the programming was done by re-wiring the whole computer physically for the specific task (aka. program). Storing program in memory is a brutally significant improvement: need longer program? just add memory; bug in program? just modify some bytes - etc.

@Eugensson 9 ай бұрын

There's also Mill architecture

@MrHaggyy 9 ай бұрын

@ern0plus4 in the really old days, when computer where mostly mechanical devices like the enigma machine, you had mechanisms for "instructions" and another medium like paper cards for "data". When they started moving from mechanical to electrical they still handled instructions differently from data. It wasn't until the first silicon transistors that you made and handled data on the same material. Today it's a bit odd as on chiplets we go back to using different processes on the same wafer, so we somehow got different materials again. But basically any CPU today doesn't distinguish between code and data. But it might come back if people really do care about security. Harvard architectures are far more robust by design against injection attacks.

@jaysistar2711 9 ай бұрын

Actually, you have quite a few of us "ancients". I'm not even sure when I learned C, but 1996 is when I swiched from DOS to Linux. I got a Windows machine because people seemed to have them, so I had to recompile/test on it.

@PixelThorn 9 ай бұрын

What made you switch to Linux that early?

@idhindsight 9 ай бұрын

@@PixelThornnot op but similarly old, when win95 came out, I felt it “hid” the true OS and hated it. When I heard about Linux in 99/2000, I was an instant convert.

@jaysistar2711 9 ай бұрын

@@PixelThorn Windows 95

@darekmistrz4364 9 ай бұрын

@@idhindsight Hiding isn't bad. It's just a diffrent client use case. That helped Microsoft gain the popularity it has today.

@idhindsight 9 ай бұрын

@@darekmistrz4364 ehh I didn’t say it was objectively bad. It sure as hell isn’t for me (and most developers)

@nezbrun872 9 ай бұрын

8008 and 8080 8 bit registers are A, B, C, D, E, H & L, plus 16 bit SP and PC. H & L are commonly combined to make a 16 bit memory pointer. On the 8080, the B & C, D & E as well as the H & L register pairs can be combined, allowing up to three 16 bit registers, typically for memory pointers, especially HL, and to a lesser extent DE and BC due to the non-orthogonal instruction set. Furthermore, on the 8080, you can also do in-place 16 bit increments & decrements on these register pairs, but results don't affect the flags. HL can also be used as a 16 bit accumulator for 16 bit addition.

@X.A.N.A.. 9 ай бұрын

GameBoy?

@jaysistar2711 9 ай бұрын

@@X.A.N.A.. No, but a few arcade machines. The GameBoy uses a Rico clone of a Zilog Z80, which is related to the 8008, but not compatible, just as the 8086 is related to the 8008, but not compatible. The book called "Nailing Jelly to a Tree" is a good Z80 starting point.

@tconiam 8 ай бұрын

I was looking for this comment! The special purpose uses if the registers and limited addressing modes are the 8080 legacy issue. Compared to the Motorola 68000 series consistent registers and addressing modes makes programming the 68K a dream compared to Intel. Sadly they couldn't keep up in performance and lost out to x86.

@andrewdunbar828 9 ай бұрын

Even in the 8 bit days plenty of us used to program in machine code, simply because we didn't have assemblers. We had documentation of the bitfields of the instructions and addressing modes and wrote BASIC programs that put all the bytes we figured out on paper into memory.

@635574 8 ай бұрын

This sounds like medieval torture of computers. I wasn't aware someone could even program in machine code. Big copium.

@leeroyjenkins0 7 ай бұрын

@@635574 it's just assembler without names really, wildly inconvenient but not inconceivable

@leeroyjenkins0 7 ай бұрын

It's so funny to me that a computer would have BASIC but not an assembler

@sebastianhormann9261 4 ай бұрын

Motorola 68x assembler programmer here, absolutely digged that interview. People compare aplles and oranges and dont know what they are talking about in terms of architectures

@robgrainger5314 9 ай бұрын

Wonderful exposition by Casey there. I had a decent understanding of x86 architecture, but still managed to learn something, which is always a pleasure. Ps. Please remember to put links in your videos.

@kamikaz1k 9 ай бұрын

this is was so freaking helpful to understand. Prime thanks for forming your relationship with Casey; Casey, you should definitely piggy back off more creators' reach to share your wisdom. This is a win-win-win arrangement. 👏

@jaysistar2711 9 ай бұрын

In the original design, AX (16-bit) is AH (8-bit) and AL (8-bit). That's the `A`ccumulator. AX has overflow into the `D`ata register (DX, which also has DH and DL) for MUL and DIV instructions. There is a `B`ase index register and a `C`ount register. The other 4 are Stack Pointer (SP), Stack Base Pointer (BP), Source Index (SI), and Destination Index (DI). Some instructions only work with certain registers in Assembly (very CISC). EAX is 32-bit. In x86_64 RAX and other Rxx registers are 64-bit, and those original 8 register have general functionality.

@thewhitefalcon8539 9 ай бұрын

64 is RAX

@plaintext7288 9 ай бұрын

Assembly level debugging and optimization are black magic

@virno69420 9 ай бұрын

Most calling convention uses RAX for return value, the naming conventions are just legacy, it's not actually used as an accumulator. RBX is not the base pointer, that's RBP, there is no RSB register. What instructions only work with certain registers? This just isn't true I believe. Atleast for the general purpose 16, maybe SSE, AVX, or XMM registers are instruction specific idk we haven't got that far in my class yet.

@teodorkostov365 9 ай бұрын

@@virno69420 base in rbx doesnt mean stack base, it means memory base which is what it originally was intended for. and some instructions do use the registers with their original intention, like div, rep, movs etc...

@jaysistar2711 9 ай бұрын

@@thewhitefalcon8539 I edited to clear some things up, and fixed that typo.

@MrDivinePotato 8 ай бұрын

Great chat, love this kind of geeking out! I don't understand it fully but I feel like I got a slightly clearer picture from this.

@Maisonier 8 ай бұрын

You should get an e-ink device for reading. I like the Boox Note Air 3C (android e-ink tablet), but you also have the remarkable, supernote, papyr.

@johnyepthomi892 9 ай бұрын

Wow.. I I like this format. Very engaging and inviting.

@drooplug 9 ай бұрын

Ok. The glass board was an awesome surprise. I've seen it before, but it's still cool. Someone needs to tell me how he scrolled the the previous writing up.

@c0dy42 9 ай бұрын

its probably just a massive piece of glass that can move up or down with the help of motors

@APaleDot 9 ай бұрын

@@c0dy42 I believe it's a sheet of plastic that's rolled up on the top and bottom.

@Muskar2 9 ай бұрын

I take his course and he uses it all the time. I'm not exactly sure how he does it but probably a foot pedal. We saw the full height of the board in this video, so he has to wipe it when he needs to use more space than this.

@c0dy42 9 ай бұрын

@@APaleDot I thought that at first as well. but I think that we would be able to see that, because then there would have to be a big piece of acryl or Glas behind it, so he has a solid surface to write on.

@cybermuse6917 9 ай бұрын

Suggestion for the reason they call it a die/dye , is when coins were minted you would use a die/dye to cast a particular pattern from the metal being formed. I imagine its merely a reference to the pattern being used.

@OurSpaceshipEarth 8 ай бұрын

CEPT W MORE UV LIght less heavy hamnmer head bangers:)

@BrankoDimitrijevic021 8 ай бұрын

Funny thing in reference to 1:02:55 - files in a ZIP archive are compressed independently and you can "jump" to any of them using "central directory" which is physically at the and of the ZIP archive. So you could "decode" files from a ZIP archive in parallel, instead of serially.

@Ruuod 8 ай бұрын

17:20 isnt that paragraph ment to describe two things: 1. there are CPUs with multiple cores that have an ALU and CACHE for each core and can therefore work through instructions in parallel 2. when code is compiled , instructions can be rearranged to optimize runtime. for example when there is the need for waiting to set a variable x with x = a + b and a+b are still 'in calculation' other instructions like loading new data can be executed in the meantime. or am I wrong here. architecture classes is something I had like 10 years ago

@TheJukeJuke2 9 ай бұрын

I saw you drop a Wheel of Time reference in a video earlier today and now I have to watch you daily.

@UltimatePerfection 9 ай бұрын

Unfortunately that's not going to happen because of all the business software and games that requires x86. So any "replacement" would need to be backwards-compatible with x86, at which point it would be just x86 with extra steps, so why bother?

@jaysistar2711 9 ай бұрын

I don't think so. The Apple Macintosh stated on the M68000, then switched to PowerPC with support (an OS built-in emulator) for M68000 apps, which then switched to x86, which still had both PowerPC and M68000 support, then they switched to x86_64, which you may think was an easy jump, but probably required quite a bit of code page management in the OS, then they switched to ARM. Although they've dropped some of that emulation along the way, you can still get it, and things still can work as they always have. I think Windows and Linux could do the same. with QEMU in Docker, containers already do some emulation with buildx, so it's not too far of a jump.

@UltimatePerfection 9 ай бұрын

@@jaysistar2711 I assure you that gamers unwilling to take the performance hit associated with the emulation will stop the switch dead in its track. With Mac the thing is that a) it's a single machine by a single company so people literally have no choice but to switch if they want to use newest and best (debatable...) products, not thousands machines by thousands companies, some even being DIY affair, and b) Mac never really did games very well. Try running Cyberpunk 2077 on a mac. You can't. Or at least, you can't run it well.

@dragonproductions236 8 ай бұрын

@@jaysistar2711 The only reason why apple can do arm is because it's a horrible monopoly and people complained about the very real issues the switch caused. You're basically saying "The company store can switch and deprecate currency every year, why can't the government?" The answer is that it only exists due to artificial reasons ( You being horribly in debt to the coal company or either being too dumb to leave apple or unable to due to them grabbing your work software in their malformed tentacles).

@jaysistar2711 8 ай бұрын

@@UltimatePerfection I agree that Apple products are overpriced versions of inferior hardware, but I disagree about a performance hit being required to exist if the PC switches to another CPU because I don't know of any native version of the x86_64 at this point; the whole x86_64 platform is emulated on both AMD and Intel chips. Also, REDengine (Cyberpunk 2077) is a cross platform engine. It can run on the Switch (ARM), but doesn't because GPU performance and capability (ray tracing, API support, etc.) is more important for games than what CPU ISA they use. I've ported hundreds of games commercially, and I can tell you that Apple's mistake was in making Metal the only option moving forward. That's just more work, and, as a game engine dev, we're an overworked bunch as it is.

@UltimatePerfection 8 ай бұрын

@@jaysistar2711 Yeah... x86_64 being emulated on an x86_64 processor... Nice try, but no dice.

@CookieBeaker 9 ай бұрын

Loved this! I know you can’t go hyper technical every day but please sprinkle it in every so often! This was highly educational and on top of that reducing misinformation the article (intention or not) shared.

@robertthornton2132 4 ай бұрын

I am a software QA guy and although I do not need to know this information, I have not watched a better video in months/years. Fascinating and educational. Even thought I am still fuzzy on the lowest level of operations discussed, the levels above them now make more sense. I have had words and processes as ethereal concepts for years and I now understand them better. . Thanks.

@IkethRacing 5 ай бұрын

Playing devil's advocate here. For the washer/dryer example, if the washer/dryer combo takes up the same space and costs the same as a single washer or dryer, why not use 2 washer/dryer combos instead? Thinking about it, I guess the answer to my own question is the washer/dryer combo will always use more die space (the cost) than a single washer or dryer.

@robertfletcher8964 8 ай бұрын

the number of people who think he's writing backwards is hillarious

@PaulPetersVids 9 ай бұрын

Wow, maybe a top 5 video on the channel. This was amazing.

@darekmistrz4364 9 ай бұрын

What are other 4?

@PaulPetersVids 9 ай бұрын

@@darekmistrz4364 lol idk.

@ScottGrunwald 9 ай бұрын

Amazing Video!! Suggestion: Prime and Casey do a long form video on all the details of ARM vs x64/86.I would love to learn more about the differences of the two and why current ARM chips like M3 Max are just as performance as top intel mobile chips yet insanely more efficient. I know the node differences but that can't be all of it. Is the reduced instruction set really giving them that much of an edge? This video was amazing and I learned a shit ton, but have more questions. Sounds like decoding performance might be a huge factor comparing the two.

@azazelleblack 8 ай бұрын

It really is largely node differences. Apple is *two and a half* nodes ahead of Intel right now. Also, Intel has to design its CPU cores to work in everything from tablets to big iron servers. Apple has no such concerns. Finally, keep in mind that Apple also has full product stack control, so they make the hardware, the firmware, the drivers, the userland, etc. On the x86 side, every single part of that is done by a different company. So Apple can do optimizations that simply aren't practical in x86-land.

@lycanthoss 5 ай бұрын

If ISA was the reason for those performance differences then Intel and AMD would have the exact same performance because they both use x86. ISA doesn't matter. Microarchitecture does. Nodes and microarchitecture is what decides most of the performance/efficiency.

@von_nobody 9 ай бұрын

For decoding, but in some tight loops it do not matter if whole code block fit CPU cache and need only be decoded once. If 90% your code sit in couple of hot spots then all penalty from decoding is 10 time smaller and will not matter much. Its only matter for linear code where each instruction is hit once and is long time before program go back to this line.

@retronoby 9 ай бұрын

There’s perhaps also a similar problem with the newest instructions (newer than the oldest supported CPU for a given software). Most software don’t use them and detecting the supported instructions at runtime isn’t always practical. The legacy instructions are there because software is a very expensive thing to produce and corporations need to at least recoup the costs of old software before they either buy or produce new software. I wish Hackaday would focus more on DIY hacks like they used to, and less on opinion articles. Thank you for the great and in depth explanations, it was very useful.

@craiggazimbi 9 ай бұрын

The Name 🚀

@AregGhazaryan 9 ай бұрын

fireship?

@wanishitcha1150 9 ай бұрын

the rocket man

@infastin3795 9 ай бұрын

Even Intel themselves tried to get rid of that legacy but didn't succeed.

@radfordmcawesome7947 9 ай бұрын

are you talking about itanium or something else?

@infastin3795 9 ай бұрын

@@radfordmcawesome7947 yes. Intel has also recently proposed a new X86S architecture.

@betag24cn 9 ай бұрын

it was mostly microsofts fault really, microsoft windows is a legacy of code, nothing modern in it, not since windows 2000 basically with the arm windows project that should deliver some devices this year, perhaps that finaly happens, without intel it seems

@ThePlayerOfGames 9 ай бұрын

x86_64-v2 as standard when?

@LouisDuran 9 ай бұрын

soon

@williamhinz9614 9 ай бұрын

X86_64ex*

@deth3021 9 ай бұрын

Unlikely, intel has tried twice to change the x86 platform. Itanium, and pentium 4. Both failed, so anything they do will have to be backwards compatible.

@Kaznovx 9 ай бұрын

In 2020. However, x86_64-v2 is a codename for CPUs with at least SSE4_2 and SSSE3. (which was supported already by 2008 CPUs). For comparison, x86_64-v3 means roughly support for AVX1 and AVX2 (CPUs from 2014 and later) x86_64-v4 is with support for AVX512, but this one is a can of worms

@deth3021 9 ай бұрын

@Kaznovx OK you meant that. But that is an unrelated topic, as it is about common baselines.

@Waitwhat469 9 ай бұрын

28:30 I feel like that is a great example, but also in the actual laundry world, my rebuttal is that a special device that washes and one that dries with each taking up the same space as Washer/Dryer it would be 50% slower in total throughput. So, all I am saying is that we need to increase our WD core counts to dual core so we can finally wash/dry at a 50% increase!

@yanidoesit 4 ай бұрын

"could possibly" "could possibly" possibly could bro... I'm allowed to pick on you are I'm just as dyslexic (or worse). Love ya bro, you make boring interesting.

@quas-r 9 ай бұрын

Two phenomenal teachers talking about what they passionately love and explaining it to us is pure pure gold.

@Eren_Yeager_is_the_GOAT 9 ай бұрын

i hate it that i only have 2 options when i want to buy an x86 CPU

@joseoncrack 9 ай бұрын

Yes but it's better than no option at all.🙃

@betag24cn 9 ай бұрын

when we had via, it wasnt a option reallyrigth now you have two, but i wouldnt touch anything from intel

@betag24cn 9 ай бұрын

@@joseoncrackwell, in apple, there is no option and people is happy on android smartphones you dont choose, you choose the whole device it is nice to have options but sometimes things do work like that

@boptillyouflop 9 ай бұрын

Only 2 options for x86 CPUs is down to US government being feckless and derelict in its duty to break down monopolies (well, duopolies here but you get the gist). They let Intel/AMD use the x86 patent pool to completely own x86, to the detriment of everyone else. Citizens United lets big companies bribe politicians and this is how we got here.

@joseoncrack 9 ай бұрын

@@betag24cn It's good to have options. But yes, obsessing over it doesn't change a thing either: like, here, many people seem to absolutely despise the x86 architecture (often without even really knowing what it is now), but it just works well enough (at least on the desktop and server markets) and you'll be hard-pressed to find anything with the same performance at this price range. In terms of performance, one exception (really an exception for now) are the Ampera CPUs (128-core ARM-based), and these are still niche products, very expensive. The day there are alternatives to x86 for the same kind of applications, with as much performance and for the same price, but with more "modern" architectures, people will eventually switch en masse. But that's just not the case yet. It is for mobile devices, and has been for years, though, which is a market Intel has always struggled with.

@swdev245 9 ай бұрын

I was surprised that Prime seemed to be ignorant about the lower level (like virtual memory) and the x86 legacy stuff. Isn't at least the former part of every computer science degree?

@rusi6219 9 ай бұрын

He could have forgot it happens

@sheikhshakilakhtar1865 9 ай бұрын

Not in every institute. Nowadays, lower level stuff are studied more by electrical engineers than computer scientists. Also, people forget.

@chupasaurus 9 ай бұрын

IIRC virtual memory explanation isn't a part of courses required for BSc in CS🙃

@monsterhunter445 9 ай бұрын

@@bbourbakiyou don't need to be a low level guy unless embedded and even then there is still abstraction

@sheikhshakilakhtar1865 9 ай бұрын

@@bbourbaki Do you know what fixed parameter tractability is?

@emilemil1 2 ай бұрын

As for the legacy downside of the x86 instruction set, isn't there a cost there in terms of how many bytes long instructions have to be? I'd naively assume that you'd want the most frequently used instructions to use the fewest amount of bytes so there's less data that need to be fetched from memory during execution, but you can't exactly rearrange your instruction encodings every time a new instruction is added since that would break legacy compatibility.

@cthutu 9 ай бұрын

@ThePrimeTime. Endian is ONLY a thing when you write the value into memory. It determines which parts of the value get written first into memory. So, looking at JUST registers, AH is the top 8 bits of AX, and AL is the bottom 8 bits of AX. But.... when you write it to memory, the AL part is written first, followed by the AH part. Hope that makes sense to you.

@meneldal 8 ай бұрын

Outside of some very weird cases, memory uses a pretty large bus for access, typically larger than the bigger numbers you'd be manipulating, so there's really no endianess there, it is all written at the same time.

@cthutu 8 ай бұрын

@@meneldal That''s not true. It doesn't matter about the bus or register. It's purely if the addressable element in RAM is smaller than the value you want to write to RAM, which order are the elements. Most significant or least significant. For example, if you write a 32-bit value to an 8-bit addressable RAM, do you store the high byte of the 4 first, or last. That is endianness. Nothing to do with busses.

@meneldal 8 ай бұрын

@@cthutu What actually happens on the memory is hidden from you. The memory controller will typically give you a wide bus, and unless you go out of your way to ruin performance and send byte-sized writes, it will take the whole word (or usually even multiple words at once) then actually send data on the cells. I'm just not following what you mean when you say first or last, you're writing the whole word at the same time, even if you could do smaller accesses.

@cthutu 8 ай бұрын

@@meneldal We're not talking about the same things. If you have a pointer, say, to address 42, and you want to write a 32-bit value. One byte goes to address 42, another at 43, and so on to 45. Which part of the 32-bit value that goes to address 42 is the endianess. In big endian, the MSB of the 32-bit value will be written to address 42 to the LSB at 45. And the reverse is true for little endian.

@n00blamer 6 күн бұрын

@@cthutu Right. One is talking about spatial order and other one about temporal order.

@CrassSpektakel 8 ай бұрын

Today there are no RISC or CISC CPUs anymore outside the lowest performance class. All Middle- to Upper Class CPUs are --- Microcode. Which usually means "Peep-Hole-Translating Plattform Code into a VLIW internal Microcode inside the CPU". For the Centaur C6 you could actually use x86 and MIPS code by simply exchanging the Translation Layer inside the CPU which itself was Microcode. Once people suggested (Itanium???) to simply use the internal VLIW code outside the CPU without using the layer of Plattform code but VLIW code is HUGE. A simple "increase register A1" can easily use 256Bit. It doesn't make sense, uses tons of RAM and Cache and Bandwidth on the bus. So if you have to use a compact intermediate code... why not use the one which is already widely used anyway? Therefore it basically doesn't matter if your CPU is ARM, RISC-V or AMD64.

@darlokt51 9 ай бұрын

Great Talk! The article is kinda brain dead to be true. The Chips and Cheese article way better captures it, and thanks to ARM lighting a fire under x86, x86S and AVX10 are coming. A bit to the RISC CISC architecture part, in general architectures are converging, the idea of RISC and CISC from a hardware perspective is truly stupid. RISC has turned a lot CISC and CISC learned from RISC. For the AI folks, you can see an ISA as tokenization for your chips. The frontend and backend are nowadays mostly decoupled and a CISC-like architecture is generally better for branch prediction and prefetching as such ARM has become with every version more CISC and x86 now has to remove their legacy 8,16, etc support to make the decoder fresh and new again, which is coming.

@dan_loup 8 ай бұрын

If you kill x86, computers will turn into the same stupid thing that is the android. Imagine, having to crack your computer like a console just to run linux on it, or even a "clean" version of Windows. It's a quite horrifying future.

@Bokto1 8 ай бұрын

This is a valid concern, but on the positive side, we survived the UEFI transition, and I'd argue it mattered more to the IBM PC ecosystem than the ISA

@dan_loup 8 ай бұрын

@@Bokto1 It's about the complete package indeed. And UEFI was probably only "survivable" because it had to compete with the ol good MBR in terms of not being a locked down hell.

@dan_loup 8 ай бұрын

@truegemuese It would be a bit harder to come up with a system that allows it to be locked down to the OS and still allow you to change the hardware, but it's sadly not impossible.

@alvarohigino 8 ай бұрын

But we already have arm single board computers and it's that way.

@Exilum 9 ай бұрын

My main reason for not reading much by myself is simply curation. I know I can't curate articles better than Prime's community. Also, I might have been programming for 14 years, but I know very well the flaws in my knowledge and experience, so I like having someone else I know the opinions of give their take. I won't always agree with Prime, but I know what Prime's opinions are, so I can know when my analysis is flawed simply by comparing opinions.

@colinmaharaj50 8 ай бұрын

I was there live. I started my tech program in 1988 with the 8085. I got a job as a Telecoms tech, but they had a 8086 XT and I bought a 80286 and started gw-basic / turbo pascal then turbo C then C++. I am still with C++ 30 years later. I also did assembler. The telecoms section went bust but only long after I moved to I.T., I wish I were still in the telecoms side doing PABX stuff. I was the first non-manager / non executive staff to get a PC and Laptop in TSTT.

@yarmgl1613 9 ай бұрын

x86 has no reason to die, performance per watt is still very good and most software is compiled for it already. What needs to happen is the license for x86_64 to be available to any company wanting to implement it

@C.I.G.A.N 8 ай бұрын

it shoul not

@spicynoodle7419 9 ай бұрын

x86 is 40 years of tech debt

@darekmistrz4364 9 ай бұрын

But also 40 years of performance optimizations

@spicynoodle7419 9 ай бұрын

@@darekmistrz4364 why is it getting destroyed by ARM tho? It's more bloat than optimization

@darekmistrz4364 9 ай бұрын

@@spicynoodle7419 Oh because it's so easy to compare x86 and ARM that x86 is getting destroyed? I didn't know I'm chatting with omnipotent being

@boptillyouflop 9 ай бұрын

As opposed to RISC which would have 30 years of tech debt by now.

@spicynoodle7419 9 ай бұрын

@@darekmistrz4364 you are chatting with a being that has written x86 and ARM assembly. Yeah, ARM is superior in every way. I wish RISC-V was chosen as the new architecture though

@revenevan11 9 ай бұрын

Thank you for covering this and for having Kasey on! I'm fascinated by both the history and technical minutiae of processors, too many people take the fact that we've "tricked rocks into thinking using electricity" for granted when there's such complexity within the hardware itself. In particular, I'm fascinated by the era during which ARM was designed, because they had more affordable and compact personal computers on which they could design the chips of the future. The processor industry bootstrapped itself on a macroscopic scale!

@etgaming6063 Ай бұрын

LMAO, I love the guy in chat: "SHUT UP! CASEY IS TEACHING!"😂

@davidspagnolo4870 9 ай бұрын

Yeah, and we all need to use metric too.

@supdawg7811 9 ай бұрын

Lil Endian is what they know me as in the streets

@TheOriginalSnial 7 ай бұрын

Enjoying this, but at about 6:00, the 8086's 8-bit regs are al, ah, bl, bh, cl, ch, dl, dh, because it was possible to individually reference both the lower and upper bytes of the first 4 regs: ax, bx, cx and dx.

@CountSessine 4 ай бұрын

I remember ages ago, in the waning days of our computer science program, explaining to my friend Dom all of the details of the x86 opcode map (I had a lot of experience with x86 assembly language at that point). He was just gob smacked - he couldn't believe how inelegant it was and that it was the dominant architecture. It was a lot of fun explaining it to him.

@benjaminkemper5876 8 ай бұрын

9:38 when you said 'ah ok I'm too stupid to understand that' and immediately moved on without seeking further clarification... that was thought for thought the exact same process in my mind lol, I felt that.

@Momi_V 8 ай бұрын

The way I like to think about it is that essentially all modern CPUs are JIT-Compiling Assembly to their own internal μOps. There are lots of reasons for that and half a century worth of optimizations, but it boils down to essentially having a stable API to the CPUs functions and hiding the implementation details so you can change and improve them (like adding another ALU or completely overhauling your implementation like Bulldozer vs Zen) without having to change the ISA for people to start benefitting from it. The other reason are real-time adaptive optimizations like branch predictions, out of order, prefetch etc. that need a strong front end anyways which (in essentially all relevant implementations) completely overshadows the actual decode effort (Chips and Cheese estimated something in the 1%-5% of core power range for decode). And all of that is true for high performance ARM, RISC-V, etc. as well. The last time someone tried to do it without those optimizations was Intel Itanium and that went horribly... That's not to say other ISAs don't have advantages: ARM can be stripped down enough to perform ok ish on an in order core like A-53 if you really only have 0.5W of power, RISC-V can even be implemented on a microcontroller without it being any sort of "special version". Having an open license makes getting started (and education) easier, less legacy cruft makes implementing the MVP and testing everything less of a hassle (Intel is planning to drop a lot of that with x64s) and some ISA level decisions are more elegant than others (Scalable Vector Extensions vs the AVX512 mess for example). But it's not like x64 is completely doomed. AMD's Processors are performing decently, even compared to Apple Silicon, especially considering the process node differences, target market (Zen scales up to hundreds of Cores, TB of RAM and a LOT of PCIe I/O) and OS level influences (Windows vs MacOS). Intel is in a hole right now, but that does not mean x64 needs to die.

@Bokto1 8 ай бұрын

Itanic was not the last time VLIW has been tried. Cell happened. Elbrus tries. You can still point at them and laugh.

@Momi_V 8 ай бұрын

@@Bokto1 Cell was rather purpose build so I didn't really think of it as an x86 competitor, but I was not aware of Elbrus internals. There goes another evening... But hey maybe someday there will be some kind of breakthrough. It's always good if at least someone is trying out something different.

@cristian505fr 7 ай бұрын

Hi im a CS expect and this is my opinion. I think what the people said here is bad bacause bad things are bad. Good thing they said is that computing is the future of AI and i totally agree with that as a CS expert. Good things are good so i consider them good.

@TimSavage-drummer 8 ай бұрын

I started mucking about with the 6502 MPU purely to get a better understanding of how CPU's work. Much yack shaving later I've have an emulator of the MPU (and basic system) writen in Zig, the biggest chunk of code is the table that maps instructions into micro-ops. Each operation takes a certain number of clock cycles (eg a clock cycle per mico-op). Did I need to do this? maybe not but I have a much better understanding of the details of a CPU from the instruction decoding, micro ops, clock cycles and timing of memory.

@ArbitraryConstant 7 ай бұрын

56:10 I think the decoding issue is largely a question of x86 not being able to know instruction alignments without decoding every instruction in order. x86 does have alignment predictors to try to deal with this but they're not perfect. x86 also has to deal with corner cases like instructions spanning cache lines, and security issues like very long instructions potentially having a different instruction bit pattern embedded inside them, which can be used for what's called a return-to-libc attack. Conversely with an architecture like ARM, especially 64-bit ARM, every instruction is always 4 bytes and they are always aligned to a 4 byte boundary, so you can do as many in parallel as you want. This is one reason Apple has been leading Intel in decode width in recent years, and higher decode width helps with better power efficiency because you can run at lower clock speed and power goes with clock speed cubed.

@colton2432 8 ай бұрын

I feel like the articles argument at 54:35 is that if you have a smaller set of instructions it is easier to optimize the out of order processing that is done by the scheduler. However as Casey mentioned right before hand, these schedulers need to be incredibly performant and therefore the algorithms used to identify dependencies are no where nearly as complicated as the author feels they should be.

@TheDoomerBlox 4 ай бұрын

28:23 shoutouts to the bot br0xamson posting "L take" in stalwart protest against the crowd a modern hero

@NaishoTheNeko 8 ай бұрын

51:16 I believe a die comes from engravers. Where a person makes the negative by removing material to get the reverse image. From what I can tell, die and dye were originally spelled the same. Perhaps it was dye originally meaning what you do with it is dye a paper. Eventually the die changed in name and came to refer to the item itself.

@teropiispala2576 8 ай бұрын

I have been programming since middle of 80's. I have lots of assembler experience from x86, as well as most small computer cpu's and later on embedded systems and signal processors. I have also made bare metal protected mode code for early 386 processors. In a past few decades, I have mainly used C/C++ and done my work mostly on Arm platform. I'm still not big fan of Arm architecture, while practical implementations have had places on automotive systems I work with. It has always been a struggle with performance if you really need to do something heavy, and despite some challenges, there's been need to move on x86 architecture. For my opinion, Arm is remanent from short period where simple instruction set was key to higher clock frequency and performance, but those times are over. Nowadays it's all about parallel execution and efficient use of resources. Everything tends to draw power as long as the unit is powered. Silicon area is quite cheap and parallel cores are common way to increase performance. It's not good for efficient use of resources and definitely don't offer best peak performance. Too bad we are moving away from compiled languages, because there would be lots to be done in parallel execution of normal code. Heavily pipelined architectures already do it in some level, but only trying to solve runtime what the code is trying to do. Compilers could work closer to microcode level and allocate computation units into parallel paths in original code. Big part could be automatic and rest being controlled like OpenMp, but without overhead of splitting execution into different physical cores. Single core could have much more arithmetic and logical units, and some of them could be shared with neighboring cores.

@snakeplissken111 8 ай бұрын

I know this is approached from a different angle. But let me approach the topic from a consumer one: If x86 needs to die, what about current GPU technology? They've been arguing that x86 was dead since thirty years. Yet, progress not only still exists. It also still results in better tech for roughly stable prices. For the same money that could buy you an Athlon 64 CPU 20 years ago, you can still get a current gen Ryzen. Meanwhile, GPUs have "evolved" to the degree that you pay the price of a small Xbox for the entry level tier -- just a decade after even 100 Dollar GPUs were viable, and the midrange started at ~150 bucks. "Sinking prices are a story of the past" -- it doesn't get more "dead end" than this.

@romanstingler435 8 ай бұрын

The term "die" for a silicon chip originates from the process of manufacturing these chips. Here's the connection: Starting Material: Integrated circuits are typically fabricated on wafers, which are thin slices of electronic-grade silicon (EGS). Circuit Formation: The desired circuit patterns are formed on the wafer surface using various techniques like photolithography and etching. This creates a network of tiny transistors, wires, and other components that make up the integrated circuit. Dicing: Once the circuits are formed on the wafer, it's cut into individual pieces, each containing a single copy of the circuit. These individual pieces are called dice (plural: dice or dies). The term "die" comes from the process of dicing, which is similar to how dice are cut from a larger block of material. The wafer is essentially diced into smaller functional units, hence the name "die." Generated by Gemini