The Magic Of ARM w/ Casey Muratori

  Рет қаралды 153,462

ThePrimeTime

ThePrimeTime

Күн бұрын

Пікірлер: 400
@hariangr
@hariangr Ай бұрын
I love Casey, well spoken, knowledgeable, easy to follow even for non native English speaker (edit: I AM not native English speaker, sorry for the confusion). Technical enough yet relatively easy to understand
@pablomelana-dayton9221
@pablomelana-dayton9221 Ай бұрын
Smart yet humble, good combo and makes for good teachers
@mattmurphy7030
@mattmurphy7030 Ай бұрын
@@pablomelana-dayton9221he’s not very humble
@CaseyChesshir
@CaseyChesshir Ай бұрын
yeah i like him for another reason too
@brod515
@brod515 Ай бұрын
@@mattmurphy7030 he actually is.
@pepesilvia4564
@pepesilvia4564 Ай бұрын
??? hearing this guy has been the most infuriating experience this week. He just CAN'T get to the point holy.... he kept rambling, I'm at minute 21 of the video and he STILL hasn't got the to point he wanted to say after minute 3 of the video. He reminds me a the boomer engineers that I work with just ramble and complain and never get anything done.
@MrWalrus3451
@MrWalrus3451 Ай бұрын
Hour and a half with Casey? YES!
@Anteksanteri
@Anteksanteri Ай бұрын
You sound like an anime girl and I'm all for it 👍
@yayinternets
@yayinternets Ай бұрын
Casey is the best. He's forgotten more than I know. And I'm just a bit behind him staring down reaching 30 years as a Software Engineer. I am in awe of how verbally articulate he is over such a wide range of knowledge, in depth. Both wide and deep knowledge + articulate is a very rare gift and puts you at the top of the top in Engineering. I've had the good fortune of working directly with several "Distinguished Engineers" over my career and Casey has the all of the same qualities. Humble, incredibly articulate to a very detailed level at a wide range of subjects, doesn't talk in absolutes and knows to mention some of the tradeoffs, and know when they are getting into areas where they might lean on someone else for specific expertise. They are the best people to work with and know how to work at different levels of people without being patronizing or making you feel imposter syndrome. Casey is definitely in that class of Engineering and it's always a treat how well him and Prime work together despite coming from very different backgrounds. Well done as always, gentleman! I learned so much from this video that I had to come back and edit my original comment to add much more.
@grimm_gen
@grimm_gen Ай бұрын
I could listen to Casey talk for DAYS and not be bored
@RealGrandFail
@RealGrandFail Ай бұрын
DA : Once you know the stuff, you will get bored. It's like a machine on repeat.
@grimm_gen
@grimm_gen Ай бұрын
@@RealGrandFail I feel like once you know the stuff, the joy comes from teaching others!
@RealGrandFail
@RealGrandFail Ай бұрын
@@grimm_gen totally agree 💯
@ravenecho2410
@ravenecho2410 Ай бұрын
mollyrocket is his youtube handle (his wife does a childrens novel I think if I remember the lore correctly?), he has several amazing vids on there!
@nickwilson7241
@nickwilson7241 Ай бұрын
Casey is my favorite of your guests. Always love when he's on
@namelessbeast4868
@namelessbeast4868 Ай бұрын
Casey is such a great guest! I always learn so much when I watch these videos
@rdustinlane
@rdustinlane Ай бұрын
As an embedded engineer, this was so great to listen to. It's hard to find good content in the embedded domain.
@getattrs
@getattrs Ай бұрын
I remember back when I was in highschool trying to get into game dev I found Casey's GJK video. Reading the paper was way over my head with academic language and math symbols - but his walktrough helped me implement it and EPA. It really helped me see that stuff that seemed untouchable (paper, cryptic code, abstract code) was understandable if you broke it down, take it step by step and try to visualize. I wish I had more teachers like him back in school, or more material like his available back then. Kids these days are really lucky to have content like this available almost effortlessly
@thesenamesaretaken
@thesenamesaretaken Ай бұрын
It's both a blessing and a curse. Great learning materials are out there and readily available if you know where to look, but knowing where to look is the hard part, with low quality or outright hostile content often winning at SEO and pushing down the gems.
@Loanshark753
@Loanshark753 Ай бұрын
The issue of junk search results is only growing, hopefully soon we get hypergoogle.
@Goras147
@Goras147 9 күн бұрын
Wanted to put this as background, turned out I can sit on my toilet for whole 1.5 hrs just listening to this. Very informative! Thank you Primeagen and Casey!
@AndrewTSq
@AndrewTSq Ай бұрын
People seem to forget that both Intel and AMD had RISC cpus already in early 90ies. One of Sega's most popular arcade games used the Intel i960 (Sega rally yeaaaahhhh)
@MartialBoniou
@MartialBoniou Ай бұрын
True. I still have a i860 in my NeXTcube. At some point, Intel has also made an ARM CPU: the XScale.
@AndrewTSq
@AndrewTSq Ай бұрын
@@MartialBoniou oh, a NeXT cube :O I love that design. Always when I did a drawing of a computer I made it like a NeXT Cube :) . I hade forgot about th XScale actually lol :)
@ВладиславДараган-ш3ф
@ВладиславДараган-ш3ф Ай бұрын
OMG we only have i9 today and there was already i960 in 90ies
@MrHaggyy
@MrHaggyy Ай бұрын
This was a great talk from Casey, especially out of the head. There is one thing i would like to add to "the ARM ISR". There is not only one but a bunch of them. The most important ones are Cortex -A, -M, and -R. Their main difference is how you attack performance requirements from a (descrete) math point. Cortex A is the general compute approach. They are designed to run an OS and are used as CPU's, in phones, mobile, or AI clusters. Their goal is pure compute power, even at the cost of determinism or safety with things like branch-prediction, chunkwise caching, etc. Cortex R is realtime applications like the ABS/ESP in a car, a flight controller or the primary controll of a power/production plant. They are designed to guarantee a computation within a certain timeframe, provide redundancy, private memory for certain things etc. Cortex M is for microcontrollers. In very broad terms they are a hybrid of R and A. They can map a view realtime features, while still do some general compute when necessary. They are a great choice for a car door with the window control and a view buttons. Intel used to have different sets with x8150, x82 etc. but the portfolio narrowed down to what is known as x86 today. While ARM diversified from theoriginal ARMv1 / ARMv2 chip. They are also roughly the same age, they just grew in different industries.
@Summanis
@Summanis Ай бұрын
Ian Cutress did an interview with Jim Keller and has a clip that would make a great supplement to this titled "Jim Keller: Arm vs x86 vs RISC-V - Does it Matter?".
@Bvngee
@Bvngee Ай бұрын
Casey just seems like such a wonderful human being.
@cryptonative
@cryptonative Ай бұрын
Casey is better than wikipedia
@Deadshadows
@Deadshadows Ай бұрын
No doubt
@grendel_eoten
@grendel_eoten Ай бұрын
Most things are
@asdfghyter
@asdfghyter 5 күн бұрын
@@grendel_eoten nah, wikipedia is way better than e.g. most social media, including youtube comments. wikipedia is also way better than many youtube videos, especially when it comes to stuff like accuracy
@grendel_eoten
@grendel_eoten 4 күн бұрын
@@asdfghyter Get a degree in aerospace engineering and try to use Wikipedia for anything related.
@ra_0403
@ra_0403 Ай бұрын
So the processors that fetch multiple instructions in one cycle are called superscalar. And they can either be in-order execution or out-of-order execution. When it is out of order, they undergo register renaming (using a map and a free list of physical registers) to resolve dependencies (other than true dependency), and get dispatched into a buffer (Register Update Unit) where they wait until their operands are ready. A group of instructions get picked from this RUU and is executed since all the dependencies are resolved. Then, there is an in-order commit for the instruction at the head of the RUU. So we get in-order dispatch, out-of-order exec and in-order commits
@bentomo
@bentomo Ай бұрын
One big extra power burn is x86-64 devices is the platform is desktop and laptop with expandable RAM. You need more voltage to drive big ram sticks further away. And ARM has always been on embedded with soldered down ram. Intel just demonstrated with Lunar Lake chips with soldered ram on the laminate, saving the memory controller voltage puts them a LOT closer to Apple Silicon in terms of performance per watt. You could bucket a big thing like RAM config in Casey's business explanation. REALLY good explanation from Casey!
@-_James_-
@-_James_- Ай бұрын
ARM was developed as a desktop CPU though, and that's where it started. On the desktop.
@bentomo
@bentomo Ай бұрын
@@-_James_- thanks for the correction. It wasn't until 1992 that the apple newton was a mobile device with an arm cup in it.
@lugaidster
@lugaidster Ай бұрын
To be fair, mobile atom CPUs used in cellphones of the era were using embedded dram too.
@Loanshark753
@Loanshark753 Ай бұрын
First there are cores developed by ARM UK and GPUs developed by ARM Norway, then there are third party designs, by Qualcomm and Apple.
@-_James_-
@-_James_- Ай бұрын
@@Loanshark753 Intel had some ARM designs for a while too after they acquired them from DEC.
@Angel-Fish
@Angel-Fish Ай бұрын
That was FANTASTIC!!! Pretty nostalgic too. I was lucky enough to build my 286, 386, & 486 computers back in the day when they came out. If they kept that naming convention, I wonder if the latest computer would be a 10086 or 20086 by now.... I totally had an assembly course in college. It good to know "nobody" writes that stuff nowadays. If you still do, then consider yourself nobody.
@r.k.vignesh7832
@r.k.vignesh7832 Ай бұрын
You'd be happy to know that there is a new Intel 285 chip coming out soon! The Core Ultra 9 285 has 24 cores and is among the highest tier of the upcoming Arrow Lake chips.
@Angel-Fish
@Angel-Fish Ай бұрын
@@r.k.vignesh7832 Yikes! I almost thought they went backwards. 286 was short for the 80286 processor... looks like the 285 is short for 285K (285,000). Not sure if those numbers are a true apples to apples comparison but at least they are headed in the right direction. 😅
@r.k.vignesh7832
@r.k.vignesh7832 Ай бұрын
@@Angel-Fish The K is used to distinguish chips w/ unlocked multipliers from the standard ones. There will also be a 285 non-K. This would have been the Core i9 15900(K) with last year's naming scheme, but they changed it for some reason. Probably to confuse us even more.
@CaptTerrific
@CaptTerrific Ай бұрын
I had no idea Godbolt was named after a Mr. Godbolt!!!! He just took the #1 spot on the "best surnames of all time" list from my friend Mr. Goldhammer
@hunterleeves131
@hunterleeves131 Ай бұрын
Casey’s performance aware programming course is so rad, this dude rules
@ARKSYN
@ARKSYN Ай бұрын
You guys really need to just start a podcast. The chemistry is great, Casey is a blackhole of knowledge and Prime keeps the mood lighthearted and fun.
@tengstrand
@tengstrand Ай бұрын
This was a great one. I spent thousands of hours programming the 6502, M68000, and M68020 back in the ’80s and ’90s. It was a lot of fun, but nowadays I’m quite happy to be coding in higher-level languages, especially my favourite - Clojure. Still, I sometimes miss the days of programming in Assembly and C. There was something special about having complete control over everything running on the machine.
@kippie80
@kippie80 Ай бұрын
Yep, past few years, been filling in and expanding knowledge and capability in assembly, for fun
@iraniansuperhacker4382
@iraniansuperhacker4382 Ай бұрын
Assembly is still pretty fun its just a lot of instructions to keep track of. Ive messed around with doing a basic x11 hello world and it was almost 1000 lines
@toby9999
@toby9999 Ай бұрын
Same for me... 6502 and 68000. I still prefer lower level coding. Most of my work is with lagacy C code and C++
@jonathanhoffstadt1366
@jonathanhoffstadt1366 Ай бұрын
It’s time to reboot the “Jeff and Casey” show with the new “Prime and Casey” show.
@jesusmgw
@jesusmgw 21 күн бұрын
I would love to see Jeff interact with Prime too. And throw in Jon Blow there too.
@mike200017
@mike200017 Ай бұрын
Great talk! A good follow-up topic might be the memory model differences because (1) it's one of the major differences an actual programmer might hit when porting code from x86 to ARM, and (2) I would imagine it has power consumption implications since x86 chips are required to do more possibly useless work to keep caches coherent.
@TerenceKearns
@TerenceKearns Ай бұрын
I love the way Casey explains stuff. I learned so much just from his preamble.
@tikabass
@tikabass 29 күн бұрын
I didn't think much about ARM until I had to program data transfer using DMA. The ARM DMA subsystem is a marvel to behold, a fine piece of art.
@spidermancrawlingtheweb
@spidermancrawlingtheweb Ай бұрын
"I can't believe we're doing all of this just to run JavaScript" lmao
@wmouse
@wmouse Ай бұрын
I've been followed Casey since he started Handmade Hero and I love the dynamic between you two.
@robertlawson4295
@robertlawson4295 Ай бұрын
Power consumption is a byproduct of the electronics design (transistor architecture) and NOT ANY firmware or software characteristics. That's why the first ARM chip just happened to be able to operate using stray electric currents from peripheral components on the PCB. That wasn't on purpose but something that was discovered by accident. Well, that sort of discovery now becomes a desired "feature" to pursue on purpose and here we are.
@SimonAyers
@SimonAyers Ай бұрын
That is true. However energy = power x time. So if a process takes longer to execute it can consume more energy even at a lower power consumption. So for a particular application a lower power device is not guaranteed to be more energy efficient.
@gurdeepgss
@gurdeepgss Ай бұрын
i am 30 minutes in and i think i can listen to casey 10 hours. 👍🏽
@JohanStrandOne
@JohanStrandOne Ай бұрын
Casey has literally flipped my approach to web performance on its head. Love it!
@UnidimensionalPropheticCatgirl
@UnidimensionalPropheticCatgirl Ай бұрын
<a href="#" class="seekto" data-time="58">0:58</a> Prime being hilarious while ruffling a lot of feathers completely on accident. The risc-v guys really don’t like being called cisc even though it essentially turns into one the moment you include any of the common high perf extensions.
@cyuria
@cyuria Ай бұрын
I think there's not really a solid boundary between risc and cisc, but I reckon risc-v at least does it well by splitting the entire isa into extensions which have individual purpose as opposed to have extensions hacked on with new versions or whatever. I believe the beauty of risc-v is that you can create tailored chips for a specific application. For example, you might slap a bunch of vector extensions and parallelisation extensions but leave out stuff like atomics to get a low power, efficient gpu (ofc the technology isn't really developed to that point, but that's the theory anyway). So risc-v is really good for specialised chips as opposed to necessarily desktop cpus, which are pretty much always going to devolve into cisc anyway at some point
@kutto5017
@kutto5017 Ай бұрын
Used the BBC micro B at school.... It was the business.... The RISC based Archimedes was on the horizon and it was truly from another universe 😊. It was so far ahead it was indescribable in the late 80s... It was a jump from 8 bit to 32... That's pretty massive.... Price tag to match.....
@JohnFrancisShade
@JohnFrancisShade Ай бұрын
Love The Primeagen’s priorities on display! ❤
@Karn0010
@Karn0010 Ай бұрын
Finally got time to sit and watch this. I absolutely love these chats with Casey, I always learn so much. He is an amazing teacher and I'm glad there are people out there like him. I'm so glad Prime has him on and that Casey wants to be on as well. Can't wait for the next lesson.
@myentertainment55
@myentertainment55 Ай бұрын
Low level programming but in simple language. What a treat! ❤❤❤
@dogman_2748
@dogman_2748 Ай бұрын
I wouldn't say LLL talks in an overly complicated way
@damirkekez4692
@damirkekez4692 Ай бұрын
Another Casey video, this is just what I needed to make my day.
@matthewoldham4804
@matthewoldham4804 24 күн бұрын
The amount of preamble here was v precisely calibrated - I’ve never looked at assembly at all, but followed every point made, expertly done!!!
@olivierdulac
@olivierdulac Ай бұрын
Thank you Casey, it's always a treat to learn from you.
@Whatthetrash
@Whatthetrash Ай бұрын
Thank you for going slowly to make sure that you don't leave anyone behind, Casey! Thank you!
@benb3928
@benb3928 Ай бұрын
Thank you for introducing the godbolt decompiler for those of us that didn't know. Having done some x86, PIC and other chip assembly programming in school long long ago ( that I hardly remember) this is a great primer for demystifying low-level instructions. There is a small hang-up I'd love to get his take on for clarity, I seem to recall that x86 had a much much larger instruction set with machine instructions that would take 10-20 cycles to execute while the more basic (Motorolla etc) chips did not; the more basic chips used, AFAIR, only the accumulator to perform operations (with few exceptions), while x86 allowed a subset of instructions to perform operations entirely within CPU registers without touching the accumulator value. Even ops like addition to direct memory locations were possible (beyond the CPU registers) whereas basic chips would have to move those values from memory to registers, perform add op and the result would have to be moved back from accumulator to the original mem location. All this to say the idle power draw to the extra transistors that x86 has to perform the ops on so many working registers was significantly higher, and as a result x86 arch was not as power efficient over the long periods where it doesn't use those extra functions. Is that still the case or is ARM arch now as "bloated" as x86 where it has similar transistor count in the ballpark order of magnitude as x86?
@GTXDash
@GTXDash 21 күн бұрын
Man. This guy is so good at explaining things that even someone such as myself that doesn't code can understand.
@MrWalrus3451
@MrWalrus3451 Ай бұрын
Casey is so powerful flip actually zoomed in when he said it.
@michaelk__
@michaelk__ Ай бұрын
As someone that did some arm assembly writting for learning and such, this was really cool to listen to.
@jasonchen-alienroid
@jasonchen-alienroid Ай бұрын
ex-system architect here. instruction sets are not the issue, it's the way how it's architected. As ex-bios engineer worked on APM and ACPI and later specialized in power management on ARM devices, it's just night and day differences on how two architecture approaches designs. One example of why instructions doesn't matter. When I was a bios engineer, I worked on x86 asm. When I worked on arm, I've mostly used c/c++. Only the rare time I have to use jtag and debug in asm and that's almost never the issue. On power implementation approach, x86 is almost an after thought. ARM platforms I worked on literary think of every possible way to try to improve power in every iteration.
@Freshbott2
@Freshbott2 22 күн бұрын
Great to come across someone who’s really familiar with it. For Intel - WHY is it an afterthought? Don’t they have as much to gain from the same? But by the original notion - isn’t it expensive to run all this fancy decode outside the core when modern compilers just aren’t using the breadth of x86? Surely that’s a whole bunch of transistors ARM just doesn’t need to contend with?
@jasonchen-alienroid
@jasonchen-alienroid 22 күн бұрын
@@Freshbott2 I didn't work for Intel but I suspect it's purely due to politics. They had ARM license back in the days when they did PXA270 and they know how it works. The fact that they sold it off and not apply much to their architecture (at least from external pov), seems they just didn't care for it enough. I'd assume they were making so much money from server side that they just didn't care for the ARM threat. On fancy decode, it's not that expensive to run outside (just think how mobile works). Also not that complex to add these to compilers (maybe back in the days if they add that to gcc). Or the ISA can have prefetch to sort of know this is certain kind of workload that needs to be offload to the correct component/core. Again, just think how mobile works. It has all the features of a pc in a SoC.
@TurtleKwitty
@TurtleKwitty Ай бұрын
About the arm chip being 0 power usage; if memory serves the anecdote is that the input power of the clock signal for the display was enough to power the rest of the chip
@ControversialOpinion
@ControversialOpinion Ай бұрын
That's how I remember it. Or was it current on the data pins? Something like that. Not electric fields though, never heard of that. And doesn't really make sense, either. :D
@TurtleKwitty
@TurtleKwitty Ай бұрын
@@ControversialOpinion input signals in general most likely yeah, might have a variation of which input depending on where you heard it from haha
@-_James_-
@-_James_- Ай бұрын
It was voltage leakage from the support chips that provided enough power for the first ARM samples to run without any dedicated power supply of their own.
@JOHNSMITH-ve3rq
@JOHNSMITH-ve3rq Ай бұрын
Lmao prime bailing to deal w the kid is brilliant. Love it
@maddada
@maddada Ай бұрын
Casey is right that it's not the ISA that's mostly affecting efficiency. Intel Lunar Lake is an example of how x86 can match or even beat ARM in terms of low power - while keeping backwards compatibility. Intel and AMD just needed to prioritize low power and Apple + Qualcomm finally gave them a real reason to. Lunar lake has similar performance, heat, and battery runtime numbers to M3 and Snapdragon. See Just Josh's lunar lake video for more about this. However ARM is better since it's more open and more competition is happening there to get the best performance per watt.
@user-mikesmith
@user-mikesmith Ай бұрын
For the variable length instruction decoding on Intel, the CPU doesn’t necessarily need to decode what the compiler generated, it can theoretically decode something else. The CPU executes what is in instruction cache and the move from memory to instruction cache is slow. In theory you could remove variable length instructions on the fetch to instruction cache and give the CPU fix length microcode instructions.
@UwU-f2a
@UwU-f2a Ай бұрын
That have cons. Intel CPUs are designed to execute legacy x86 instructions, and these are inherently variable length. Converting instructions into fixed length microcode would require a significant architecture overhaul, impacting compatibility with existing software and instructions. Intel CPUs already have optimizations like the micro op cache. This cache holds decoded uops for reuse, reducing the need to repeatedly decode instructions from memory. This already achieves a similar goal of reducing decoding overhead by reusing pre decoded instructions
@KF1847VM2
@KF1847VM2 Ай бұрын
> For the variable length instruction decoding on Intel, the CPU doesn’t necessarily need to decode what the compiler generated, it can theoretically decode something else. No. The incoming instruction stream, regardless of whether it is variable or fixed length, has to be decoded as is. > The CPU executes what is in instruction cache and the move from memory to instruction cache is slow. As slow as the memory system can operate at, provided that software does not interfere by making things worse - which sadly is a common case. Without reuse caching is not faster than directly running off memory. > In theory you could remove variable length instructions on the fetch to instruction cache and give the CPU fix length microcode instructions. In practice this is what various platforms did and continue to do in various forms for several decades. What gets fed into the core from the instruction stream perspective is very different to what is actually being acted upon internally.
@yourposer
@yourposer Ай бұрын
i 💜 Casey Muratori's deep dives
@ilu1994
@ilu1994 Ай бұрын
Love to see Casey, please come on more often!
@CjqNslXUcM
@CjqNslXUcM Ай бұрын
x86 is like utf8 and ARM is like utf16
@ravenecho2410
@ravenecho2410 Ай бұрын
As someone with like datascience/machine learning, I always have no idea where Casey is going but I always love to come on the adventure and I always learn something new -- pulling up the webtool and following and playing along really helps with this video! Casey's channel is "molly rocket" btw, it always escapes my brain and then I remember -- incase you are looking for it u.u
@liquidpebbles
@liquidpebbles 23 күн бұрын
Casey is the GOAT. I can't get enough
@Ahsan_Fazal
@Ahsan_Fazal Ай бұрын
ANOTHER CASEY VIDEO!!! ❤🎉
@avidessauer154
@avidessauer154 Ай бұрын
You should have someone on to talk about the difference in memory models (x86 strong, arm/riscv weak). Also worth touching on how the C11 memory model's adoption has made far more software compatible with weak memory models.
@KvapuJanjalia
@KvapuJanjalia Ай бұрын
<a href="#" class="seekto" data-time="77">1:17</a>:08 I remember when Intel invented new instructions specifically for XML parsing. I would not be surprised if we see JSON parsing instructions in next i9 or something. EDIT: I exaggerated quite a bit: SSE4.2 text processing instructions are general purpose, not intended for XML processing only.
@poteitogamerbr2927
@poteitogamerbr2927 Ай бұрын
Seriously? Tried to google it to found what instructions do this but found nothing. Do you have sources?
@KvapuJanjalia
@KvapuJanjalia Ай бұрын
@@poteitogamerbr2927 SSE4.2 text processing instructions: PCMPESTRI, PCMPESTRM, PCMPISTRI and PCMPISTRM. I guess when they were introduced, XML was the new hotness, and these were marketed accordingly. Looks like they actually are general purpose and can be used for JSON processing too.
@mss664
@mss664 Ай бұрын
@@KvapuJanjalia Those are really just for string searching. You can use them to implement for example strpbrk. And they have a variant for null-terminated strings.
@poteitogamerbr2927
@poteitogamerbr2927 Ай бұрын
@@KvapuJanjalia thanks, it seems very cool. I wonder if compilers like gcc actually optimize say C code into those instructions since they are very specific or you must call them directly.
@morosis82
@morosis82 Ай бұрын
​@@poteitogamerbr2927 that might depend on a couple of things. As far as I understand, if it's a fairly widely supported instruction then your compiled binary may contain it with a fallback for a chip that doesn't support it. If it's quite specific you might need to let the compiler know through flags to include it.
@burkskurk82
@burkskurk82 Ай бұрын
I can’t shake the feeling that this discussion becomes second guessing after some 40 mins. It’d be good to invite Jim Keller on the show.
@TheBadFred
@TheBadFred Ай бұрын
In the early 80s, if you had a Commodore 20/64 8-bit with a MOS6510 and your programs had to run full speed, there was nothing but assembler.
@michaelday341
@michaelday341 Ай бұрын
I bought "Creating Arcade Games on the Commodore 64," and I think I also bought a machine language book, too. Sadly, I didn't get very far with either book. But, I remember the excitement I had finding out that books like that existed, because I really wanted to program games. Too bad I didn't have the skills that others did.
@TheBadFred
@TheBadFred Ай бұрын
@@michaelday341 Basic was better than nothing.
@eavdmeer
@eavdmeer Ай бұрын
Exactly this got me into assembler on the C64. Pure performance poverty 😂 Not even a compiler. Just writing code directly in my Power Cartridge monitor.
@eliasepg
@eliasepg Ай бұрын
I loved this talk, I learnt a ton, and helps understand everything so much better
@asdfghyter
@asdfghyter 5 күн бұрын
this is absolutely fantastic! very informative!
@hburke7799
@hburke7799 Ай бұрын
thank you, finally something besides the typical ARM RISC copypasta that's 30 years out of date.
@marble_wraith
@marble_wraith Ай бұрын
Godbolt sounds like a man who is blazingly fast!
@skeleton_craftGaming
@skeleton_craftGaming Ай бұрын
Fun fact, the a in arm originally stood for acorn, the makers of the BBC micro... The first arm chips were literally acorn asking how they could make a sequel to the BBC micro [or one of its successors. I'm not British or a computer historian for that matter]😊
@nexovec
@nexovec Ай бұрын
Legendary video with a mandatory algorithm boosting comment from me.
@jordanjackson6151
@jordanjackson6151 Ай бұрын
SO glad this is finally up. ARM is on my to 'RUN' list. It's apparently effective at reading Malware. I've been spoiled by Lua, Python, JavaScript and so on.
@przemekkobel4874
@przemekkobel4874 Ай бұрын
<a href="#" class="seekto" data-time="3360">56:00</a> Guy in a documentary I saw told they forgot to connect Vcc rail, but the first Acorn Risc Machine chip was able to run on currents passing through pull-up resistors (stuff that stabilizes bus state).
@BenniK88
@BenniK88 Ай бұрын
Love it, the content we need. Thx ❤
@renecouture3719
@renecouture3719 Ай бұрын
Lot of knowledge and history here! Sounds like ARM instructions are a better design, I'll keep it in mind
@xenmax
@xenmax 20 күн бұрын
About the ARM no power anecdote there is an interview to one of the engineers that work on the first ARM chip in Acorn (ARM used to be Acorn Risc Machine) in which he explains that the first prototype of the chip when they first tested it they measured 0mA current going in the power rails. They soon realize it was because the power ralis were disconnected but the chip was working anyway because the current was flowing in by other pins in the package. It doesn't mean that the chip used virtually no power, only that it used little, so little that only with the input signals and capacitors had enough to work with, without the power rail conected to anything.
@Kniffel101
@Kniffel101 29 күн бұрын
<a href="#" class="seekto" data-time="3360">56:00</a> There's a great 3-parter video interview with Sophie Wilson on channel "Charbax". If I remember correctly, she talks about the low power ARM stuff in one of those.
@Nightwulf1269
@Nightwulf1269 Ай бұрын
Well....x86 around 1600 instructions, Arm around 150 and RISC-V (GC) has around 40....but that's not the sole deciding thing, On RISC-V the instructions are no longer human readable (if that's even possible) in their hexadecimal form and optimized for the instruction decoding logic to be as simple as it could get. So if we compare those, compare comparable things. But other than that detail, fantastic video and great knowledge shared by Casey! Thank you very much!
@mlv60
@mlv60 Ай бұрын
i cant get enough of casey talking about computers ❤
@chaitanyakumar3809
@chaitanyakumar3809 Ай бұрын
If you get Casey on again for a similar topic, I think reading through and discussing David Chisnall's article "There's No Such Thing as a General-Purpose Processor: And the belief in such a device is harmful" would be interesting -- he goes into things like the energy impact of complex decoding machinery.
@FunwithBlender
@FunwithBlender Ай бұрын
"I only look at it occasionally" lol after that knowledge bomb
@SergioStankevich-ef2mf
@SergioStankevich-ef2mf 17 күн бұрын
I adore the Casey streams and the rabbitholes²
@PavelAslanov
@PavelAslanov Ай бұрын
Maybe I do not understand all the details, but I think memory model is way more important in the limit. x86 is way more restrictive on how it can reorder memory access (for atomic operations it will always be memory_order_seq_cst) in spirit it is very similar to GIL in python. While arm is free to do way more reordering and given how slow memory access is I can see how this difference can bring substantial edge in performance.
@steffennilsen2132
@steffennilsen2132 Ай бұрын
I learn so much from this, quality content
@brod515
@brod515 Ай бұрын
by the time I reach @<a href="#" class="seekto" data-time="1761">29:21</a> I was absolutely excited. this is the most interesting stuff.
@rev0lu7ion
@rev0lu7ion Ай бұрын
i love hearing casey talk about anything
@_vdm_
@_vdm_ Ай бұрын
Love these videos with Casey
@kippie80
@kippie80 Ай бұрын
It is lower power because it is lower count of transistors AND there are less switching transitions per productive computation. Initially. Then yes, the trend to lower voltages and physical layout of transistors. Still though, those initial design constructs count. Also, switching to thumb mode is way to power down extra circuitry in chip. Power is burned when a transistor transitions.
@orthodoxNPC
@orthodoxNPC Ай бұрын
<a href="#" class="seekto" data-time="147">2:27</a> 1500- 3600 instructions vs 240 instructions... yea real hard to justify "reduced" instruction set
@mrrolandlawrence
@mrrolandlawrence Ай бұрын
i loved coding arm assembler. just conditional suffixes to any instruction to skip or execute (B - branch ... B (LE) B(EQ) ... so simple. avoids branch hell in cisc, wilson did a top job on the instruction set. 4<a href="#" class="seekto" data-time="3265">54:25</a> not quite true. on the older CPU's having to wait 11 cycles to complete a single op was not uncommon. Esp when you had NOP instructions. > wilson created the ARM instruction set to be the "programmers dream wish list". Houser did the layout for ARM 1. > the 1st arm's had minimal microcode. > lower IPC. ram at the time for other processors was waiting for those 11 cycles to complete before getting the next OP. > optimising your compiler for ARM - easy peasy. Optimising for x86 with those dozens of sets of extensions - like playing hopscotch in a minefield. This is why AMD & Intel came together because this is now so bad a problem. > the tale of zero watts is true - kinda. When they measured the power draw they were very worried about the power and heat because if high would require a much more expensive ceramic housing. Anyway the digital meter measured 0000 watts. true story. After some investigation, they found that the power was actually getting leaked via the address bus and that was enough to power the CPU at full speed and that the CPU power pin had 0v because of a board defect.
@ehh54
@ehh54 Ай бұрын
Love the opening 😂
@CoderDBF
@CoderDBF Ай бұрын
Thank you, Casey.
@Barnardrab
@Barnardrab 26 күн бұрын
<a href="#" class="seekto" data-time="274">4:34</a> to <a href="#" class="seekto" data-time="280">4:40</a> - That's when Prime started getting off track and Casey canceled that tangent in a heartbeat.
@UKCheeseFarmer
@UKCheeseFarmer Ай бұрын
Anyone interested check out the British film Micro Men (available on KZbin legitimately). This covers the story of the companies and engineers of Acorn, Sinclair, and the development of the BBC, Spectrum, and Electron (Clive Sinclair, Chris Curry, Steve Furber and Hermann Hauser).
@nick4uBB
@nick4uBB Ай бұрын
That was great. I learned a lot - thank you!
@muayyadalsadi
@muayyadalsadi Ай бұрын
<a href="#" class="seekto" data-time="3263">54:23</a> at minute 54 and orthogonal memory access is not mentioned. <a href="#" class="seekto" data-time="61">1:01</a>:09 beside the decoders. Orthogonal memory access modes in x86 is why it needs more transistors to be implemented.
@AK-vx4dy
@AK-vx4dy Ай бұрын
"Read assembly language fluently"
@sweep-
@sweep- Ай бұрын
Prime is the best talk show host!
@FunwithBlender
@FunwithBlender Ай бұрын
love Casey! he has a big brain
@martinrodriguez1329
@martinrodriguez1329 10 күн бұрын
What I take from this is that x86 comes from a very old place where instructions didn't take more than just 2 bytes, but as time went by, the need for bigger instructions lead to a solution meant for retrocompatibility, which made instructions take more clock cycles to figure out what you're trying to do. ARM, on the other hand, decided (probably due to experience) to keep a fixed size for instructions with a certain large that they would think it's enough, and thus making them all take the same time which would be (I would assume) 1 clock cycle. The other thing I take from this is that there's not a big necessity for better CPU's and the companies are relying on programmers wasting resources so they would need better products due to that inefficiency so they can keep the marketing going, which is... concerning.
@the_dude_josh
@the_dude_josh 19 күн бұрын
More Casey please!!
@dimedriver
@dimedriver Ай бұрын
The orginal tdp design goal for the 1st arm chip was under 1watt. This was a design goal to keep the packaging cost down. Its covered in a few interviews with the orginal designers. I think they wanted a plastic package vs. a ceramic one. The ceramic package would cost $10 to manufacure vs $1 for tthe plastic. Market forces pushing later designs is probably correct. They had to design each chip to come in under an already low bar. Intel on the otherhand could slip by a few watts or just bin the chip at a higher tdp as long as performance was somewhat in line for that tdp.
@alh-xj6gt
@alh-xj6gt Ай бұрын
Casey
@UnidimensionalPropheticCatgirl
@UnidimensionalPropheticCatgirl Ай бұрын
It's really good in C and C++, if you have the right tools (although even stupid gdbgui can align lines and assembly it's producing nowadays and simple give you optimized out message when it's done so) and your compiler knows how to output symbols for them (most tools just used either DWARF or sometimes STAB). You can usually get general idea of whats happening and eventually learn to see when the codegen did something stupid.
@morosis82
@morosis82 Ай бұрын
C and CPP are both low level languages that were designed to abstract assembly. So you don't want something interpreted, because then you're actually profiling the interpreter not the code you wrote, and you probably want something that is fairly low level and doesn't abstract away too much or you won't be able to match your block of code to the assembly output. When is it good? Generally I'd argue anytime you're working on a platform with limited resources, or you're interested in low level performance for some reason - there's a piece of code you need to run at scale and to avoid needing to double your infrastructure optimising this hot loop is worth doing. Because it's always a tradeoff - I know prime and Casey like to say it always matters to optimise, but often the extra cost of the hardware required because the most optimal language was not chosen is a fraction of the cost of the engineer to be able to do said optimisation.
@SETHthegodofchaos
@SETHthegodofchaos Ай бұрын
Great stuff! More Casey please :)
@msclrhd
@msclrhd Ай бұрын
I suspect that things like the branch predictor, instruction pipelining, and other chip designs/architectures would affect power more than the ISA. If you want the instructions to run faster you need to do more work on the CPU that may consume more power. If you want to use less power you are constrained in the kinds of CPU-level optimizations you can do.
@SimonM90
@SimonM90 Ай бұрын
Its amazing that we get to tap into this mans knowledge for free
@garethlagerwall
@garethlagerwall 12 күн бұрын
Love these discussions
X86 Needs To Die
1:09:15
ThePrimeTime
Рет қаралды 527 М.
You Need Kubernetes?
27:59
ThePrimeTime
Рет қаралды 240 М.
Smart Sigma Kid #funny #sigma
00:33
CRAZY GREAPA
Рет қаралды 8 МЛН
ТВОИ РОДИТЕЛИ И ЧЕЛОВЕК ПАУК 😂#shorts
00:59
BATEK_OFFICIAL
Рет қаралды 6 МЛН
Can You Find Hulk's True Love? Real vs Fake Girlfriend Challenge | Roblox 3D
00:24
МЕНЯ УКУСИЛ ПАУК #shorts
00:23
Паша Осадчий
Рет қаралды 5 МЛН
This Theory of Everything Could Actually Work: Wolfram’s Hypergraphs
12:00
Sabine Hossenfelder
Рет қаралды 817 М.
'The Cloud Fugitive' | David Heinemeier Hansson | NTK # 001
19:54
DARK MATTER +
Рет қаралды 13 М.
Microservices are Technical Debt
31:59
NeetCodeIO
Рет қаралды 648 М.
Buying a Brand New PC is Dumb...
17:01
Linus Tech Tips
Рет қаралды 1,9 МЛН
I Went To DEFCON!
16:25
ThePrimeagen
Рет қаралды 292 М.
CLIs Are Making A Comeback
53:54
ThePrimeTime
Рет қаралды 205 М.
What Makes A Great Developer
27:12
ThePrimeTime
Рет қаралды 213 М.
Creating a window - Software from Scratch
1:04:12
Muukid
Рет қаралды 139 М.
Smart Sigma Kid #funny #sigma
00:33
CRAZY GREAPA
Рет қаралды 8 МЛН