why can’t computers have thousands of cores?

Рет қаралды 713,932

2 жыл бұрын

If you're watching this video on any device made in the last 10 years, be it a desktop, a laptop, a tablet or a phone, then there is an extremely high chance that your device is powered by a multi-core processor. Since the release of the first dual core processor in 2005 by IBM, it has become more and more common for computer processors of all varieties to be multi-core. This is in direct contrast to laptops in the 2000's, like my iBook G4 for example, which was powered by a single core PowerPC processor at around 800MHz. Now a days, it is common for any desktop to have at least 4 cores, and clocked easily into the GHz.
But what does it mean for a processor to have multiple cores? How does a processor with multiple cores work? Why are more cores better than just one? How many cores are too many? These are all really important questions, and, like you, I was curious to find the answer.
🏫 COURSES 🏫
Learn to code in C at lowlevel.academy
🔥 SOCIALS 🔥
Low Level Merch!: www.linktr.ee/lowlevellearning
Follow me on Twitter: / lowlevellearni1
Follow me on Twitch: / lowlevellearning
Join me on Discord!: / discord

Пікірлер: 1 300

@utubekullanicisi 2 жыл бұрын

Both Intel and AMD are rumored to release server processors (codenamed Sierra Forest, and Turin respectively) with more than 200 cores in the next few years (as soon as 2024). Servers will continue to scale well and make use of as many cores as you can give them.

@kayakMike1000 2 жыл бұрын

Aren't these are intended for data center where customers lease VMs or some other slice ? AMD has encrypted RAM....

@cassandrasibley228 2 жыл бұрын

This video is about home and personal computers. Obviously industry hardware is gonna be a lot tankier

@kayakMike1000 2 жыл бұрын

@@cassandrasibley228 well, some of those xeon server processors end up in high-end workstations. I suspect higher end workstations might have take a mid road between core count and individual core performance.

@AnarexicSumo 2 жыл бұрын

@@kayakMike1000 Can't I just enjoy learning about the extreme limit to the same tech without people going "Uhm ackshually that's not designed for consumers"

@littlemeg137 2 жыл бұрын

Sun/Oracle had a 128 core SPARC64 chip over a decade ago. I've still got one of those servers in my basement.

@davidthacher1397 2 жыл бұрын

Technical Tradeoffs: 1. Power - Power Consumption / Thermal 2. Area - Core Size / Cache / Memory Bandwdith / IO interconnect / Yield 3. Performance - Architecture / Instruction Set / Clock Speed / FPGA / Multicore / CMT / NUMA / SIMD Business Tradeoffs: 1. Algorithm Capability / Application 2. Market share / Manufacturing Size 3. Time to Market / Training

@leosmi1 2 жыл бұрын

Thank you

@brolysmash9333 2 жыл бұрын

bro you’re the best. Thanks for sharing this up. I’m a network engineer and didn’t know nothing about that.

@larrydavis3645 2 жыл бұрын

As a former programmer, not all functions of a program can be run in parallel. Sometimes a function needs to wait for another process to finish before it can proceed.

@pwnmeisterage 2 жыл бұрын

You just can't count or calculate the next number before you've finished calculating or counting the one before it. There is no logic, no clever math or algorithm or brute force which can speed up some simple processes. The complex things just have to wait until the simplex things get done.

@MrZnarffy 2 жыл бұрын

Thats if you iterate...But try using a functional language where even iteration is done by recursion....

@larrydavis3645 2 жыл бұрын

@@MrZnarffy Thank you for the feedback.

@duaanekobe2773 2 жыл бұрын

It can be endless, 12 is the basic max, 24 = 2x all at very stable more. times infinity Now I seen 36 max and it gets bugs. a controller separating 24 and next 24 = 58, etc... So buffer and code (stable 24 time x), makes best code function. now is the actual program and controller(s), Think A,B,C and the programs, export

@larrydavis3645 2 жыл бұрын

@@duaanekobe2773 Thank you for the feedback. I did most of the programming on mainframe computers and the programs there were extremely linear in nature. We use subprograms for common functions meaning the main program was in a wait state waiting for the subprogram to complete.

@veleriphon 2 жыл бұрын

We see the cores to code limit already with Threadripper 64 core, 128 thread units. It's hilariously overpowered for most tasks.

@SandTurtle 2 жыл бұрын

i feel bad for people who buy a threadripper then realize their favorite games either dont support multithreading, or only support 1 or 2 extra threads for main logic.

@giahuy8701 2 жыл бұрын

@@SandTurtle of course, Threadripper is not for gaming

@saricubra2867 2 жыл бұрын

@@giahuy8701 No way to blame sh*tty game code optimization on monster CPUs. There are games still struggling on CPUs with 4 cores due to bad optimization and very low CPU use.

@SandTurtle 2 жыл бұрын

@@giahuy8701 ye ik but I've heard of people tryna buy them for gaming

@ed_iz_ed 2 жыл бұрын

@@giahuy8701 games can EASILY make use of multiple threads

@badass6300 2 жыл бұрын

Also, a big factor is that many programs have linear logic. Amdahl's law shows how well a task scales with multiple cores depending on how parallel it is. For 50% parallel tasks above 4 cores is pointless. For 75% parallel above 16 cores is pointless. You just don't gain performance and that is baked in the logic of the task. Many cores are great when doing multiple of the same task without caring which task is completed first.

@mewsermeow8683 2 жыл бұрын

That and the fact that once you start getting into problems that are highly parallelizable, you'd just use a gpu anyway.

@badass6300 2 жыл бұрын

@@mewsermeow8683 if the gpu has the instructions for it, but 99% of the time yes.

@hjups 2 жыл бұрын

@@badass6300 It's not about if the GPU has the instructions, it's largely about the type of problem too. GPUs for example, don't do well with highly divergent streams, but do well with highly uniform streams. Modern CPUs can often do much better with divergent streams due to their internal out-of-order nature, and throwing more CPUs at the problem has almost perfect scaling with Amdahl's law in such problems (very small sequential part - usually global book-keeping).

@badass6300 2 жыл бұрын

@@hjups True, but GPU architecture is getting close to CPU architecture with the passing generations. AMD GPUs since RDNA1 have hardware schedulers and might get OoO Execution in the future. Then again with chiplets they might get a whole CPU to themselves for certain tasks. Or vice-versa, integrated GPUs might get good, or both.

@hjups 2 жыл бұрын

@@badass6300 Not really. GPUs are fundamentally different than CPUs due to the parallel / vector nature. Some improvements have been made to handle thread divergence, but they are never going to be robust as a CPU.... otherwise.... they would be CPUs... As for OoO, both NVidia and AMD GPUs do OoO internally. It's not something advertised though.

@roelesch 2 жыл бұрын

You're more describing the limits of the Von Neumann architecture and our current (mostly sequential) models than anything else imho. Have a look at Erlang and the Actor model, and I think you'll argee that processors can scale just fine if we rule out shared memory.

@kayakMike1000 2 жыл бұрын

The functional wizard has spoken. (Pay no attention to the man behind the curtain)

@Handelsbilanzdefizit 2 жыл бұрын

It's called memory driven computing. Very smart. HP tried this for a while. I have no idea what happened to this.

@roelesch 2 жыл бұрын

@@Handelsbilanzdefizit Thanks for sharing!

@StCreed 2 жыл бұрын

Occam on transputers already solved a lot of issues with programming. Too bad it never took off.

@roelesch 2 жыл бұрын

@@StCreed Interesting, thanks for sharing!

@AlessioSangalli 2 жыл бұрын

I always categorized "asymmetric" systems the ones that, while having multiple cores, do not have cache coherency - so it's up to the programmer to synchronize the cores. I once worked on a system that was running Linux on a core and an RTOS on the other, with independent MMUs

@LowLevelLearning 2 жыл бұрын

Two OS's on separate cores, very interesting.

@llothar68 2 жыл бұрын

Yes, i think Apple will have to go this way. We can see on the M1 Ultra that they hit the limit for chip connect already. But it could be nice to have a start with getting "blade computer" into the world of desktops. We had them in servers for a long time. Multi Socket Boards still try to do cache coherency. But unfortunately desktop computers aren't there yet.

@MatthijsvanDuin 2 жыл бұрын

Embedded and mobile SoCs quite commonly can have lots of different cores with little or no coherency. For example TI's TDA4VM has (counting only freely programmable cores): - dual-core arm cortex-A72 - three dual-core arm cortex-R5F subsystems - one TI C71x DSP - two TI C66x DSPs - two real-time subsystems with 6 TI PRU cores each with cache coherency only available between the cortex-A72 and the C71x as far as I understand (with snooping of main memory access by other cores or DMA, but no coherency with local caches of e.g. the R5F or C66x subsystems), while many of TI's older SoCs have no cache coherency whatsoever.

@kippie80 2 жыл бұрын

This is already done with Intel and Apple chips. Security. Forget name in intel but Apple put its T2 chip in the cpu Mx series

@Mrcrappyfuntastic 2 жыл бұрын

Didn't the Ps3 have a similar issue too?

@benandrew9852 Жыл бұрын

I've recently started a job as a technical support engineer / technical writer working on complex digital signal processing applications. Videos like this, like yours, are exceptionally valuable to me as a non-programmer. There are limitations to design, implementation and efficiency that are contingent on factors entirely within low-level hardware programming, and having them explained so succinctly makes my job way easier, because I'm being provided with a higher level understanding that I can pass on to my reports. Props. And, on a more personal-craft level, the quality of your videos in terms of rapidly explaining complex topics through efficient use of graphics and constrained use of jargon is very inspiring. Well done.

@MichaelBristow137 2 жыл бұрын

My first computer had 48k (it was an Apple II+ with 16k extra memory). I remember learning some assembly language. Now I have a multi gigabyte memory (255 Gb SD, plus 128 internal) phone which takes 8 Mb photos... I am so amazed at how far we've come and what the computer is actually doing to even display what I'm typing right now. It's mind bogglingly amazing...

@EdKolis Жыл бұрын

I remember the animated intro for Megaman X saying back in 1993 that X had 32,768TB of RAM, and I had to look up what a terabyte was and I was like "lol what". Now that actually seems feasible in the not too distant future - will AI advance to the same point as X too?

@rodjacksonx Жыл бұрын

My first was an Atari 800, I'm pretty sure it had 64K. It was a fossil even when I got ahold of it. I recently fulfilled a childhood dream by building a new system and just maxing out the RAM for the heck of it. 128GB has never felt so good!

@cpK054L Жыл бұрын

@@EdKolis well... 64-bit operating systems won't go away for a LONG time... as it can address 16 exabytes (still at least 6-folds away) Nothing says you can't have 32 exabytes of RAM... they question is.. .why? You might as well live alone in a 100,000 sqFt mansion and ask yourself... why?

@embrikchloraker8186 Жыл бұрын

@@EdKolis What also amuses me is that, even in the future with specs like that, they're apparently still using DOS interfaces.

@Bobby-fj8mk Жыл бұрын

I learnt Intel 8085 assembly language back in the early to mid 80s. What is actually going on is so simple on an instruction by instruction basis. At some point in history the CPUs were able to allocate tasks from a single program to multiple cores all by themselves without a programmer writing instructions for them to do that.

@dannygjk 2 жыл бұрын

Back in the 80's I read a magazine article about the 'Connection Machine' which had 65536 processors but each processor wasn't like a core that we think of these days. Each processor was a tiny simple device which operated in a massively parallel architecture. Such machines had a limited practical value since they were specialized for a narrow range of problems and were also limited by being a 'hard-wired' architecture. Right now I can't think of a better description but I do know I should word it differently. I vaguely remember it had clever solutions to how to break down tasks and how the machine's processors worked together. It makes me think of things like ant colonies.

@ivanscottw 2 жыл бұрын

Errr... GPUs ?

@dannygjk 2 жыл бұрын

@@ivanscottw I don't remember if the connection machine was analogous to a GPU because I don't remember the details of the architecture.

@mateusvmv 2 жыл бұрын

Sounds more like a cluster

@dannygjk 2 жыл бұрын

@@mateusvmv iirc the whole machine's architecture was roughly analogous to a GPU. It wouldn't be like what people think of as a cluster we have these days. Each processor was a very simple device nowhere near what a processor is in a cluster we think of these days. It was huge tho compared to a GPU which isn't surprising since the first one was built in the 80's.

@littlemeg137 2 жыл бұрын

The whole point of the Connection Machine's hypercube topology was to allow programmers to define the optimal architecture for the problem they were trying to solve. Unfortunately, very few HPC programmers of the time could make the cognitive leap to this model from Fortran on vector machines.

@dmitrykargin4060 Жыл бұрын

Scientific computing guy here. Most often we hit RAM bandwidth limit. Sometimes we use all bandwidth by a single core with optimised AVX2 code and perfect memory layout. Using more cores will just slow down everything until you switch to a platform with more DDR channels.

@TrippTech Жыл бұрын

(electrical engineer here) LOVE this, great explanation!!! One thing i would have mentioned, especially when talking about single core chips is "out of order" execution where the chip executes instructions as soon as its ready, rather than everything waiting in a queue. Probably one of the biggest advances in chip design in history.

@LilacMonarch Жыл бұрын

The "number of transistors doubling every 2 years" might already be hitting its end. The problem is in order to add more, they have to be made so small that it's impossible to keep the circuits properly separated. The gaps are so small that electrons easily jump across, causing shorts. Maybe we will see an increase in larger sized CPUs, but that will have its own problems.

@NFchegg Жыл бұрын

Chiplets

@AlMcpherson79 Жыл бұрын

Improve efficiency without improving capability to the point that we can start stacking the processors... resulting in THICC CPUS.

@LilacMonarch Жыл бұрын

@@AlMcpherson79 now that sounds like a thermal nightmare

@jsmith8147 Жыл бұрын

Just have to start making subatomic transistors, I'm sure with enough money and engineers someone will crack it with some room temp quantum engineering.

@KeinNiemand Жыл бұрын

The number of transistors still increases so we havn't hit the absolute end yet, but it probably has already slowed down from the doubling every 2 years of morse law.

@pwnmeisterage 2 жыл бұрын

GPU and SPU cards already pack hundreds or thousands of "cores" onboard. They can only process simplex tasks, not complex tasks, but they can stream their outputs in near-realtime. They suck a lot of power and spew a lot of heat while working at full load.

@gantz4u 2 жыл бұрын

Which theyve been laying the ground work for since single core with things like liquid cooling and cryogenic cooling. Even my air cooler block is light years above what we had in the 1990's to where its on par if not out cools a 1990's water cooler.

@mm2f419 2 жыл бұрын

what are spus?

@hjups 2 жыл бұрын

@@mm2f419 I think it should have been "DPU" not "SPU". So the smart network cards like Nvidia's BlueField.

@JorgetePanete Жыл бұрын

simple*

@CocoaEm Жыл бұрын

@@JorgetePanete they do lots and lots of simplistic operations just when its added up its complex.

@RonJohn63 Жыл бұрын

AMD also released their first dual-core CPUs in 2005. (Of course, not everyone instantly bought them...) Another issue with huge core counts is cross-core communication: threads usually want to talk to each other, and the wiring between all those cores gets crazy. You effectively get a traffic jam in there...

@jessepollard7132 Жыл бұрын

That is what the crossbar switch is used for with communication between the CPU and the shared cache that mediates acess to the memory bus.

@RonJohn63 Жыл бұрын

@@jessepollard7132 right. But even crossbars have a bandwidth limit. This is also why NUMA was developed.

@jessepollard7132 Жыл бұрын

@@RonJohn63 yes, but it isn't a bandwdth limit - but number of switches limit for physical implementation. NUMA provides the same interconnections but with different constraints.

@jessepollard7132 Жыл бұрын

@@expressionsartistic5856 Actually that was built by Sun, not Cray. If I remember right that was supposed to be the CS-64.

@310_Latchkey_kid Жыл бұрын

This is my first time watching one of your videos and honestly all I can say is that your answer to all those questions are very comprehensible and easy to understand! Great work.

@ccflan 2 жыл бұрын

On of the best KZbin channels out there, it feels like you should pay to see this Content so thank you

@LowLevelLearning 2 жыл бұрын

Thanks for the love as always!

@ObligedTester 2 жыл бұрын

Totally agree. I hope some of my youtube premium dollars end up on this channel 😅

@8lec_R 2 жыл бұрын

There's a patreon, feel free to pay. I can't afford to so I'd rather have content that is free and is viewer supported rather than something locked behind a paywall

@desmondbrown5508 2 жыл бұрын

I think it would be interesting to go into why GPUs CAN have so many cores, be parallelized more effectively and with better thermal efficiency, but CPUs cannot. I know the answer, but I do think it would be an interesting follow up video.

@JustinShaedo 2 жыл бұрын

Total agreement. I don't know the but I'm certainly curious!

@richardg8376 2 жыл бұрын

@@JustinShaedo A basic explanation would be that the kind of work a GPU does is easy to break up and spread among hundreds of small cores, and a GPU is designed for parallel processing on tasks that don't depend on each other. In a GPU you define a single program called a "shader", essentially a script which defines various inputs and what the GPU should do with those inputs. Each core on the GPU then runs in lockstep with each other: they all run the exact same shader script, albeit with different parameters. You cannot have half the cores run one script and the other half run the other. This is great for 3D graphics where the output of each pixel on a screen can be calculated independently, all using the same script. This is also why we still need CPUs and not just run everything on GPUs: GPU cores cannot run separate processes simultaneously on each core: only hundreds of copies of the same process with different inputs.

@Chezrlz009 2 жыл бұрын

@WJ gpus are designed to do a bunch of math at once. Each core is designed for a very specific task. Hence tensor cores, rt cores, etc. Cpus are supposed to be able to handle anything and everything, but maybe not as efficiently.

@Chezrlz009 2 жыл бұрын

@WJ i dont see how that shows gpus being able to have more cores in the first place. Utilization is different than physical cpnstraints. Alsp, they want the OS to work on old laptops or cheaper ones which the majority of which have really weak processors with few cores. If ms optimized windows for pcs with 8 cores, pcs with 4 cores would struggle to do anything. I agree though. I wish linux replaced windows. Windows is a cash grab but so is everything in capitalism that isn't truly for free.

@Chezrlz009 2 жыл бұрын

@WJ yeah it stinks. Sadly, quantum computing has the same issue as more cores. Noone will want to switch, but nothing digital can be translated. Qbits arent binary and work off of quantum superpositions. They are programmed entirely differently. Additionally, quantum entanglement is highly unstable and can't be observed or interacted with in any way or the particles will lose their entanglement and define themselves. That means you need to cool the qbits with liquid nitrogen, which is very difficult. You are correct about gpus being useful to streamers though. For certain tasks, computers can use gpus to perform tasks such as encoding streams and rendering videos on the go. Chip makers design a specific pipeline for a gpu that will help it perform tasks. Gpus made for rendering game graphics tend to work well with rendering streams which is handy. Pipelines are basically made up of core clusters that each perform a different tasks and instructions are sent through the pipeline to have things like shaders, sharpening, particles, textures, etc applied. Edit: tldr: quantum computing and more cores have the same obstacle of the economy and society having trouble changing rapidly and both, especially quantum computing, would disrupt a lot. :c

@jeffreymelton2200 Жыл бұрын

I selected the video based on the name of the channel alone! Brilliant naming of the channel. Anyways the video was very informative. I actually learned quite a bit from it. I appreciate the style in which you narrate your videos. making the subject matter incredibly comprehensive, and digestible. Thank you for the content!

@renchesandsords 2 жыл бұрын

to be fair, in the science and datacenter space, that kind of core density can be effectively leveraged, threadripper and epyc proved it, and the development of processors like genoa and bergamo only serves to drive that point home further

@drstrangecoin6050 2 жыл бұрын

Yeah exactly. I got clickbaited by the title because I work on a system with over a thousand cores. Promise based task schedulers and MPI make it possible to recruit massive computing power for certain workflows and vectorizing loops over a distributed system is somewhat independent of code at this point. Old Perl script? Throw it into the OpenPBS scheduler with GNU parallel and loop over your entire data set as a matrix.

@prashanthb6521 2 жыл бұрын

@@drstrangecoin6050 I think you are getting it totally wrong. There is no machine with a single CPU of 1000 cores. You are using a cluster with independent memory bandwidth. That doesnt provide any hurdles mentioned in this video at all.

@xeridea 2 жыл бұрын

The main issue is that, many tasks don't gain much efficiency from being split to many cores, due to having data dependencies on previous instructions. Generally, better applications for multithreading are those that have workloads easily divided up. Anything to do with graphics tends to be heavily threadable, which is why GPUs these days have upwards of over 10,000 tiny cores, you have millions of pixels on a screen, so it is easy to split up the work. Game logic, however, isn't as easy to split up, which is why games don't generally benefit from having more than 6 CPU cores. It would be trivial to have a CPU with 1000 cores, just shrink the cores. With CPUs though, it is generally better to have a smaller number of cores, that are better at executing code fast, than it is to have a crazy amount of simple cores. It is significantly more energy efficient to have more cores if workload can use them, which is why GPUs are so much more efficient at drawing graphics than CPUs. On the flipside, GPUs are pretty bad at general code, since to effectively use them, code needs to be what is referred to as "embarrassingly parallel". Many non graphics tasks are still able to be effectively programmed on the GPU, so they are still used for non graphics tasks, just not as CPUs.

@homeboy6668 2 жыл бұрын

Hey, could you consider making videos on compiler design maybe? it'll be cool to learn too. BTW, awesome video.

@raven4k998 Жыл бұрын

how many core's will windows need in the future more then it needs now🤣🤣🤣🤣🤣

@herrxerex8484 2 жыл бұрын

This is one of my fav channels genuinely , could make a RISC-V series or compile resources for it to learn . would love to learn more riscv just don't have a structured way to yet

@LowLevelLearning 2 жыл бұрын

Working on it! :D

@joelsmusic7771 2 жыл бұрын

Risc processing is a college course offered at most universities.. I generally enjoyed working with this language.

@LowLevelLearning 2 жыл бұрын

@@joelsmusic7771 RISC is the general idea of reduced instruction set computers, where as RISC-V is the open source architecture and spec for those processors. RISC-V is more like saying MIPS or ARM than RISC alone.

@mikapeltokorpi7671 2 жыл бұрын

I remember about drooling RISC processors in early 90's with my school mate. Seems to finally mature to commercial products (like Raspberry Pi replacement). Both high priced for the performance at the moment, though. However/depending on problem, you should get your discrete code running way faster than in CISC architecture on those.

@a.j.outlaster1222 Жыл бұрын

Your thermal explanation makes sense, I have long wondered what if instead of a structure like Motherboard->Cores What if we tried something like Motherboard->Main Cores(Like a secondary Motherboard)->Cores And I now know that the thermal thing would most likely still apply to that, Thank you so much for this video, It made things clear! :D

@diconustra 2 жыл бұрын

I was sysadmin on a couple of multi-processor machines which had very similar issues scaling, except across CPU's and CPU boards. One was an IBM X460 with 4 interconnected chassis, each with four Intel CPU's, the other was an E25K with a half-dozen CPU boards, each with four SPARC CPU's. In each case, we ran into scalability issues related to memory bus bandwidth, the latency of memory fetches and bus I/O across chassis (X460) or boards (E25K). Operating system and database configuration and tuning helped, but ultimately both platforms faced diminishing returns on performance as boards and chassis were added, with 16CPU's being the sweet spot.

@llothar68 2 жыл бұрын

Apples M1 Ultra already hit it. I'm very curious how they will design their MacPro. But i predict we go with multiple computers in the same chassis, also known as Blades in the 2000s server days.

@JohnMiller-mmuldoor 2 жыл бұрын

6:51 I need me one of them intel I69420 processors 😆

@saricubra2867 2 жыл бұрын

I'm still using a 4 core 8 thread CPU from 2013 for audio. The code is NOT bad, IS the real time audio processing itself that runs in series which is the bottleneck. BUT, putting audio tracks in parallel scales way better with more cores. Some tasks by themselves ARE the bottleneck, not their code.

@zxuiji Жыл бұрын

Well adding more cores and keeping all buses available to all threads (albeit not at exactly the same time, just close enough) is easy, all that's needed is a dedicated chip who's only purpose is to loop through each boolean bit linked up to threads supported by the CPU to check if they need an operation done via RAM, the actual operation is then read from the thread that set the bit and once completed the bit is then cleared to say the operation completed, the thread doesn't need to care what bus was used, only that the operation was handled by the chip as soon as one was available. It being a chip it can be made to skip the "if N < THREADS_SUPPORTED" logic by just linking a power of 2 (2,4,8,16,32...) threads and letting the index overflow back to 0 during incrementation thereby reducing power consumption from it and the time it takes to get back to a waiting thread. As for RAM side of things, the most can do is sport an equal amount of RAM as needed to hold all apps and their virtual memory in memory, unlikely to happen any time soon though and would require some understanding on the user's side like "if I open this app with closing another then all apps will be slower due to surpassing RAM limit"

@CustomCans 2 жыл бұрын

I saw the title of this video and instantly thought of the Cerebras wafer scale processors - I think they definitively prove that computers and CPUs can have thousands of cores ;)

@ABaumstumpf 2 жыл бұрын

Can have and be useful for general purpose are very different things. You can build a jetpack, you can build a microwave that is powered by hamsters - does not mean you should do it or that it would make any sense to do so.

@leovang3425 2 жыл бұрын

@@ABaumstumpf more like having a supersonic airliner, sure its fast. But it's not economical or is it pleasant to be around.

@prateekpanwar646 Жыл бұрын

@@leovang3425 Concorde

@erikshure360 2 жыл бұрын

It's pretty much impossible for Moore's Law to persist for another 50 years -- transistors can only be so small. If anything, a different form a computing will takeover by then -- like optical computing.

@4.0.4 2 жыл бұрын

And they'll call it quantum computing for marketing purposes. Which isn't entirely wrong but not what people expect.

@officialrights6009 2 жыл бұрын

Or analoug computers

@matsv201 2 жыл бұрын

Well.. yes but now. Effecticly the way it was originaly precived it alreddy died 10 years ago... well really even more. The nm scale we have to day os symbolic, not real. The transistor density have been increased by using other tricks like standing transistors or more layers

@ABaumstumpf 2 жыл бұрын

At some point yes, it will stop being correct. And no sane person doubts that. But we do not know yet When that will happen. Also quantum-computers very likely are not the answer to 99.999% of all problems as far as we are aware - they simply are too slow and inefficient for anything that does not involve sifting through enormous amounts of combinatory possibilities. @@matsv201 "Effecticly the way it was originaly precived it alreddy died 10 years ago... well really even more." No.

@matsv201 2 жыл бұрын

@@ABaumstumpf you probally need to motivate your starment a bit

@slurpieking4337 2 жыл бұрын

Honestly great video, very entertaining and informative. Good job!

@RunForPeace-hk1cu 2 жыл бұрын

More core = more memory = more cache = more interconnect speeds = more energy = more heat. Cache coherency nightmare.

@albertsun3393 2 жыл бұрын

Interesting thing about multiple cores is that coherence and even just latency in communication between multiple cores eats a huge chunk out of performance. Arbitrating cache coherency between one, two, maybe four cores isn't too bad, but when your critical path in coherency (or latching for multiple clocks) goes all the way across the chip, suddenly your performance drops like a rock. We've seen the transition from higher frequency to more cores because of the exponential increase in power consumption when increasing core clock, but with too many cores we sometimes struggle to even hit our initial clock due to all the overhead for everything else.

@Rokabur 2 жыл бұрын

From everything I've seen, more almost always means lower clock speed (unless you're over locking). My quad-core i7-4820K runs at 3.7Ghz while I've seen Threadrippers running at barely 3Ghz.

@Demon09-_- 2 жыл бұрын

@@Rokabur the thread ripper is not quite a fair comparison thats two different brands with different ipcs and applications. the less clocks may be true to some degree but it would be more fair for you to compare it vs intels new i9 12900k that has 16 total cores and still will run at 5gz out of the box it has some stuff with P cores and E cores. So if you want a more all P core comparision you can look at the 10900k that would still do 4.9ghz out of the box on all 10 cores and 20 threads. and comparisons inside amd are similar there newer 5950x which is 16 cores barley loses any clock speed to there lower 5600x . when comparing you really have to stay inside of the same artitech intel does lose a little bit of clock speed comapred to there lower chips when you compare in server stuff . But server stuff is a little bit of a different ball game where you can get up to 56 cpus that have alot of memeory lanes

@69k_gold 2 жыл бұрын

"If you're watching this on any device made in the last 10 years.." Me watching this on my 2008 Windows XP Professional PC with an Athlon chipset: *You're wrong*

@billyswong 2 жыл бұрын

We are in 2022 now.

@drakealex Жыл бұрын

hey, thanks for putting this into laymen's terms, I recently started on my CPU journey. good content.

@brolysmash9333 2 жыл бұрын

Bro you were recommended somehow but im glad I saw your video. I always wondered why not more CPUs but I’ve been procrastinating to commit into it lol

@mully006 2 жыл бұрын

This is a good video, but I think that you overlooked an important aspect and that is HPC computing. While no single chips have thousands of cores, in high performance computing it is common to run code on many many nodes all with 64 or more cores. Additionally GPU are really just proceccors with more limited instructions and they generally come with thousands of cores on a single die.

@dannygjk 2 жыл бұрын

The architectures of GPUs and CPUs are different it's not just about instructions.

@DJ_Force 2 жыл бұрын

You didn't talk about wafer yield. The more cores you have, the physically bigger the chip. The bigger the chip, the more susceptible it is to random manufacturing defects. Meaning, the bigger the chip, the more likely it is to be defective. This can dramatically raise the price since you get less sellable chips per silicon wafer.

@WaterZer0 2 жыл бұрын

So it's fair to say there's an ideal ratio in terms of cost to core count? At least from the manufacturer's point of view.

@DJ_Force 2 жыл бұрын

@@WaterZer0 Well, the smaller the chip, the better the odds it doesn't have a defect. But yes, too small and it won't be powerful enough to do be competitive.

@therosses5 Жыл бұрын

the first computer I touched was the Tandy radio shack 80 model 1. I was surprised you were able to explain cores in a way an old guy can understand. very well done. I'm astounded that after decades the speed of our apps are still held hostage by sucky HD read/write nonsense.

@jessepollard7132 Жыл бұрын

Basically, a core is just a CPU. a multicore processor, is just a collection of CPUs wired together to access RAM. it is why each core has an L1/L2 cache and sometimes L3 dedicated to its operation, then a shared cache for all CPUs to use for access to RAM. The shared cache is either called L3 or L4 (usually L4 if the CPUs have dedicated L3).

@specialopsdave 2 жыл бұрын

My dual-core desktop has had enough performance for everything until 2 years ago, but I don't play many AAA games anyways, so it still works fine for me.

@satrah101 2 жыл бұрын

Same here, running Linux on it. Gets the job done,

@diablo.the.cheater 2 жыл бұрын

Some task can only benefit from paralelization until x limit, some task simply are not even paralelizable, some have very minor gains that would add unnecesary code complexity and some task you can always trow more cores to do faster. In general PC use case, most tasks are sequential so more cores only benfit you if you are doing a lot of multitasking

@rtyzxc 2 жыл бұрын

This. Game logic for example. First you tell a character to move x amount. Then you check for collision, and if hit, correct the position or execute some logic. Then you might check if the character is shooting, which again, depends on the character's position. Things have to happen on the correct order, you can't just have multiple cores do each thing simultaneously, or the results would get messed up depending on the order in which the tasks happen to be completed.

@techpriest4787 Жыл бұрын

@@rtyzxc that is why OOP is not a thing for games but data oriented programming makes more sense. All languages are OOP except of Rust. You can do DO in C++ and C# too but that is abuse. They are not really made for that.

@overloader7900 2 жыл бұрын

GPUs: 11k cores and more on the way

@caribougoo349 2 жыл бұрын

I'd think that GPUs existing and being powerhouses for parallel compute would also be a big factor making high core CPUs more of a niche use case. Probably important to acknowledge when a CPU is actually a good option for parallel tasks

@theldraspneumonoultramicro405 2 жыл бұрын

fun fact: there is a hard physical limit to how small a transistor can be, as eventually they reach such a small size that the electrons will freely flow thru it, thus leaving the transistor permanently locked in a on state, following morse law, we should reach that pysical size limit as early as 2023.

@christophercuston 2 жыл бұрын

Hence, Intel's 12000 CPUs. AMD's Threadripper.

@DDRWakaLaka 2 жыл бұрын

0:10 I think you've confused two different facts -- IBM's POWER4 is from 2001. You might be thinking of AMD's Athlon 64 X2, which is the first *consumer* level dual-core chip and is from 2005.

@CocoaEm Жыл бұрын

the power4 chip he was on about isnt even multi core, its straight up 2 cpus on the same wafer.

@DDRWakaLaka Жыл бұрын

@@CocoaEm Yeah, I'm realizing now he's likely referring to the PPC970MP. Which, like you said, was MCM, not two native cores.

@johndoh5182 2 жыл бұрын

At 7:00, this issue is once again what I mentioned for 2:15. It's multi-threading. I blame Intel for programmers trying to play catch up with modern many core processors, although there were many software engineers that knew Intel was wrong, and this goes back to Intel vs. AMD around the time of 2008 - 2011, when desktop CPUs had loads on them that started to become large. Intel basicallly said you don't need to add more cores to desktop computing because they will be able to keep improving IPC and getting clock speeds faster and faster. And they seemed to be right because their 2c/4t CPU where better than AMD 4c/4t CPUs. And when AMD came out with an 8c/8t CPU it didn't fare a lot better. Well, those first gen 8c/8t CPUs had core pairs that shared a single FPU while each having their own APU. I know from going through classes that I was taught that FPU math was better. It really wasn't as far as running it on an X86-64 CPU. It is NOW, but it wasn't so much then and it certainly wasn't for AMD 8c/8t 8ALU/4 FPU CPUs. If only professors would have remembered WHY it is you learn Discrete math. Regardless, the point is one of multi-threading and how well an application does it. This is something that takes a lot of work for programmers, and lazy developers in the last 2 decades didn't want to think about it and Intel gave them a reason not to. Writing and testing multi-threaded software is harder. I can write a multi-threaded algorithm that is the same speed as a single threaded algorithm or possibly even slower. If one thread is simply waiting for another thread to finish work, such as I have a main thread that spawns another thread to run some function, but my main thread is waiting, this is slower even though it's multi-threaded. So multi-threading requires experienced programmers or engineers to work with a project to evaluate the software development, and it isn't always so obvious if doing one thing vs. another is more beneficial. There was one solid point you brought up other than the failure of programmers over the last 15 years to move towards developing their skills writing multi-threaded applications, and that was memory bandwidth. There is nothing other than that you brought up which is a physical limitation until there are other conditions thrown into the conversation, which then means this conversation needed to come from a person who can describe power efficiency, nodes, how clock frequencies affect power efficiency, etc........

@Dhaydon75 2 жыл бұрын

Another problem is you can have more cores or higher IPC and Freq but still be slower. But that is more of a time critical system problem.

@billyswong 2 жыл бұрын

The infrastructure and tools for efficient multi-thread software development are not yet polished enough. In theory a programming language could handle thread pool implicitly, in an OS-neutral way. Meanwhile the OSes would provide part or all of the thread pool implementation such that multiple programs using thread pool at the same time won't overcrowd the CPU and introduce unnecessary task switching.

@ABaumstumpf 2 жыл бұрын

" blame Intel for programmers " And you'd be wrong. Or rather you are just wrong. "So multi-threading requires experienced programmers or engineers to work with a project to evaluate the software development, and it isn't always so obvious if doing one thing vs. another is more beneficial." If that were the only thing. Many problems simply can not be processed in parallel. The towers of hanoi are a often used example. And not to mention all the other problems like coherency, scheduling and specially the bugs that creep up.

@johndoh5182 2 жыл бұрын

@@ABaumstumpf I know not every problem can be solved by multi-threading. There has to be real parallel paths in processing for multi-threading to make any difference. But that parallel path can simply be a few microseconds and it's still beneficial. It can be two sets of calculations that can happen independently and you'll get benefit. However Intel said EXACTLY what I said they did when AMD was releasing CPUs with more core counts than Intel. So yes, they were part of the problem. And yes, programmers have been lazy in many companies, and yes many programs can be written much better. You're about to see this play out in game engines and what happens when you bring near realistic graphics to a game. Part of this of course is the ability of a GPU, but part of this is the game engine. Unity for instance has been notorious for saying that since a main game thread dictates how fast a CPU can process code, having a game engine be multi-thread only adds complexity with no benefit. On the other hand, Epic Games released UE5 and games are going to be coming out on it starting the end of this year. I watched demos of Matrix Awakens and it was pushing a 5950X to around 40% CPU utilization. Simple math says this game with this game engine overwhelms a 4c/8t CPU, it pushes a 6c/12t CPU to 100% so even that is going to be a bottleneck, and an 8c/16t CPU is going to be minimal to run the game without the CPU being a bottleneck due to being overloaded. There's other reasons the CPU can be a bottleneck, but this is going to be the first time as far as I know that for PC gaming, a 6c/12t CPU is going to be a bottleneck simply because it doesn't have enough cores. YES, INTEL SAID that gaming would never require more than 4 cores. Now, finding old information with a search engine isn't very easy, so I'm not going to bother digging. Of course by the time they put out 9th gen, AFTER AMD had released very effective 8c/16t CPUs, Intel did a 180 on THAT statement. I'd be a millionaire if I had a dime for every time I've heard a game will never need an 8c/16t CPU. Maybe a slight exaggeration, but I think you get the point. What I think is going to happen is if a game company wants to develop a game that looks realistic, they're going to use UE5 and Unity will be relegated to more simple graphics. Autodesk, same thing. Their software gets poor CPU utilization and often when people have a powerful system, EVEN WHEN the software is rendering an image on screen, it's painfully slow. You read comment threads on their site for different software packages and users complain about this, and point out other software that does the same type of rendering and it's much faster. Adobe, same thing. They've improved SOME of their software. At some point people will leave these companies behind when new hardware is still running like a turtle. So yes I know some software cannot be optimized more than it is. But I also know that thousands of students have gotten a BS in software engineering and their professors never emphasized multi-threading along with testing multi-threaded applications. And I also know that in many cases, I'm right and we're going to agree to disagree. I was a person BTW who went through most of a BS degree in software engineering (I had already retired from the military and time was catching up to me along with my back breaking down) and saw this first hand. I ended up having back surgery before my senior year, and after that point I only wanted to work part time and didn't feel like putting 100% of myself into another career.

@johndoh5182 2 жыл бұрын

@@billyswong I agree, and I'm sure there are still many universities that don't push software engineers to program this way, and testing is hard. Testing effectiveness for multi-threaded applications, when the intent is to speed up the time it takes to run means time testing along with testing that functions work the way they're supposed to. Multi-threading can slow down an application if done improperly. Simply spawning threads to complete a task, if some other thread is simply waiting for that data can slow down performance due to passing data back and forth. So yes it does require testing and the testing is going to be very complicated, but in the end it's the right thing to do for applications that require a bit of computation, and not simply a text editor or other simple computing. "Meanwhile the OSes would provide part or all of the thread pool implementation such that multiple programs using thread pool at the same time won't overcrowd the CPU and introduce unnecessary task switching." When you have something like a 6c/12t CPU even the Windows schedulers do a good enough job at minimizing context switching. That's not really the issue. Sure if you're doing a bit of multi-tasking it can become an issue but that's not really what I was talking about. And even with multi-threaded apps, I would think that between the application and the scheduler, the scheduler isn't randomly switching a core from one thread to another. I would think that since many threads are short lived, they run to completion so data can be passed, before another thread is loaded to that core (where even with a 6c/12t CPU, it's viewed as 12 cores). When you move up to 8c/16t CPUs and even more cores, this should get easier for a scheduler to handle.

@RealCadde Жыл бұрын

It would be worth mentioning the difference between parallel and linear programs as well. A linear program is one that, in a simple example, takes the output of the previous operation as an input for the next operation. a a + b ab + c abc + d ... That's a linear operation. A parallel operation on the other hand does NOT rely on the result of the previous operation in the program as a whole. Using the previous example again, but making it parallel... Core 0: a a + b ab + c abc + d Core 1: e e + f ef + g efg + h Core 2: i i + j ij + k ijk + l Core 4: m m + n mn + o mno + p Then as all four cores have ran their code in their slice of the data, they can synchronize and this happens: Core 0: abcd + core1 + core2 + core3 or... abcd + efgh + ijkl + mnop But before that can happen, ALL cores must have completed their slices. In this simple example it's no biggie. Each core runs their slice linearly and in linear time too. So they should all finish at the same time. But in reality, not every program is that simple. Some slices take more time than the others to complete as they do more complex operations. In the meantime, all other cores are just sitting around waiting for the most complex operation to finish. Well, they are free to do other things but not for that one program as the program is waiting for the biggest slice to finish. Being able to evenly slice up threads of a program such that they all finish at roughly the same time is almost impossible in more complex programs. Especially when you aren't the only program using the cores available as the scheduler might not agree with the program using all cores at that moment in time. A somewhat perfect example of parallel tasks that actually do take the same amount of time every time (almost) is what the GPU is doing. The GPUs of today has some ten thousand cores. They all work on their own slice of a rendered image. Say you have an image that is 1000 x 1000 pixels large, or a megapixel image if you will. Those 10,000 cores will each be working on a region that is 100 pixels large. If the task is to fill a gradient horizontally across the screen, then each core simply takes the starting and ending colors and interpolates those going from start to end in their block. This operation takes exactly the same amount of time on each core so it just works on GPU's... Because graphics is less complex than programs are in that sense. Graphics don't tend to sit around waiting for user input, network communications and access to memory. Each batch on a GPU has exclusive access to memory and all cores. The more data and operations you can cram into a batch the better, otherwise you have to keep telling the GPU what to do. In other words, it's better to tell the GPU to draw ten million polygons in one batch than it is to tell the GPU to draw a million here, a million there and another million there... When the GPU has ALL the data in one batch, it splits the tasks amongst all cores equally and just barfs pixels back at you in no time.

@lucasdegreef5455 2 жыл бұрын

Best content on youtube , can you post more video on esp-idf and freertos

@AlessioSangalli 2 жыл бұрын

"Symettric" (5:05) well typos happen 🤣 seriously however the quality of the production is awesome, I wish I were this good with video editing. What program do you use, out of curiosity?

@LowLevelLearning 2 жыл бұрын

Hahaha crap, there’s always one. I use Davinci Resolve, largely because it’s free XD. Thank you!

@vikassm 2 жыл бұрын

@@LowLevelLearningFree, yes, also the small matter of it being the most powerful, fully featured A/V production suite in the world 🤣 If it works for MARVEL, I'm sure us 'lowly' KZbinrs can make do with DaVinci Resolve 😂😂

@bwiebertram 2 жыл бұрын

In the future, one super computer will do the work for every person on earth

@philipmcdonagh1094 2 жыл бұрын

You answered everything when you said there was a Boss core. Take the real world what do Bosses do, slow overall work performance down. Thank you.

@danbhakta 2 жыл бұрын

Just discovered this channel...Subbed!

@zolp 2 жыл бұрын

There are already 3-digit processors in abundance, memory access also continue to improve, and there are many applications that parallelizes well. GPUs already have thousands of cores and are put to good use.

@romanpul Жыл бұрын

Yeah, but you can‘t really compare GPUs to CPUs. To my (given kinda limited) understanding GPUs resemble more to vector processors and are only efficient for usecases where your input data can be vectotized (ie cases where you fetch huge chunks of data at once and then crunch it). CPUs on the other hand are much better at crunching data which requires frequent, atomic memory access due to their way more elaborate caching architecture

@nickscurvy8635 2 жыл бұрын

Some electrical engineers, when confronted with a problem, say "I know, I will use more cores". Now they are the ceo of amd

@sodiumvapor13 2 жыл бұрын

Great explanation! Subbed

@rickpontificates3406 Жыл бұрын

DMA comes into play also. Memory allocation is important, but having a CPU's MMU core managing its own memory helps ease the bottleneck

@seeibe 2 жыл бұрын

Thanks to GPGPU, we already effectively have CPUs with thousands of cores, just with some limitations.

@matsv201 2 жыл бұрын

That is very ture... but the flip side of that is that aplications that is easy to multi thread runs on gpgpu, while the one that is not run on the CPU... again limit the use of many cores

@null6482 Жыл бұрын

Hehe. "GPGPU"

@endurofurry 2 жыл бұрын

i had a 9980XE which is a 18 core processer. but only gets up to 4.5GHz so i decided with the new 12th gens and ddr5 I would upgrade to the 12900KS which is a 16 core (8 efficiency, 8 performance cores) at 5.5GHz and honestly I think my system ran better with the older extreme edition then the much faster newer processer. so it doesn't seem the speed is everything either, I figured the much faster speeds would make up for the few cores lost but it really didn't, i use this PC for gaming which most games don't even use more then 4 cores so my assumption was faster single core performance would be better then more cores, but that seemed to be false.

@Demon09-_- 2 жыл бұрын

eh you should have seen better performance in games if you were cpu bound. Games these days can and will easily use over 4 cores and depending on your gpu and the settings and the game you could see fps improvments quite high. But if your running alot of background or other applications more total performance may benifit you then having the higher ipc. not to mention ddr5 is quite meh atm and basicaly equal to fast ddr4 kits.

@OverDriveOnline7921 Жыл бұрын

In the world of x86, there have been multi processor systems for many years, I used to fix them in the mid to late 90’s frequently. However back then, the physical limit was 4 processors before system performance was hit, anything more than 4 were divided into sub groups of up to 4 processors and interlinked together with a separate scheduled data transfer architecture (until transputers came along, but that’s another story). This limit was overcome, in part by adding complex cache systems, and while 8 processor systems were now possible cheaply, there were two issues looming on the horizon, Moores law and physical space. The answer to keeping speed bumps predicted by Moores law? Bung more than one processor on a chip, this helps with space, and oddly enough, power consumption too. Further advancements have helped shove more cores, essentially what we used to call our CPU, onto a single chip, boosting performance as we go. However, doubling the cores does not double the performance, there are and always will be bottlenecks, which become greater with the more cores added, plus the thermal envelopes that our systems need to run under. In many systems now we get past this by breaking the chips down into multiple chip let’s, essentially smaller chips on a single chip chassis, or by adding multiple chips, meaning we’ve gone full circle. Still, it’s been interesting from my view, watching computing develop over my (nearly) 51 years at the time of posting this, with 3nm chips due to become mainstream, whole RAM modules fit into the space of an entire CPU from 4 decades ago.

@gandalfdergraue8444 Жыл бұрын

A very good explanation for CPUs and their cores...

@lockdot2 2 жыл бұрын

I am one of the few people still using a single core CPU to watch KZbin. The CPU I use is a AMD LE 1640, with 1 core, 1 thread.

@utubekullanicisi 2 жыл бұрын

You're able to stream at 4K no problem, right?

@Elinzar 2 жыл бұрын

Man... How?, Im sure even if you dont have much money you can scrap some am2 cpu with at least double the cores for like nothing these days and swap that cpu out, is it a desktop cpu right?

@dannygjk 2 жыл бұрын

@@Elinzar Sounds to me like they have a small laptop or netbook. I have a netbook it is also 1 core 1 thread and only 2 GB RAM. I would add more RAM but I don't think there are RAM modules larger than 2 GB for it and there is only 1 RAM slot. It can barely stream a video at 360p.

@saricubra2867 2 жыл бұрын

9 year old 4 core 8 thread Intel Core i7-4700MQ here at 3.4GHz 99% or 100% CPU use in one thread for gaming, audio, even for loading and saving stuff to the HDD (now SSD and the CPU is the bottleneck for the SSD, still ridiculously fast). That microbe AMD system would 100% freeze in a DAW lmao.

@Elinzar 2 жыл бұрын

@@dannygjk i looked it up and one page said it was a desktop cpu from the AM2 platform other page said it was a 2014 chip...

@leftlovers9137 2 жыл бұрын

I searched this and wala I found your video 11 hours after upload lol

@LowLevelLearning 2 жыл бұрын

LOL, good find ;)

@seanvinsick5271 2 жыл бұрын

There's also a limit with high core counts that slows things down to the distance between cores too. Think of a cube, each vertex is a core. Adjacent vertices work well, non-adjacent means you have to use a path finding algorithm. It doesn't scale as core numbers increase. You have to have a master core that tracks all of this too.

@jessepollard7132 Жыл бұрын

That depends on the implementation. using a mesh network, it is up to the network connection processing to do that, not any core as that alone would make it slow.

@seanvinsick5271 Жыл бұрын

@Jesse Pollard I think you misunderstood my point. I'm referring to the cpu, there's a limit to how many cores/cpus you can have in a single machine.

@jessepollard7132 Жыл бұрын

50 years ago there were smulti processors - which did exactly the same thing as a multi-core unit does. The limit then was about 5 processors as a max (mostly due to the memory contention limits you indicated). Some system got around the contention by using multiple memory busses - and it was up to the programmer (or schedulers or both) to avoid the contention by assigning each processor a different memory map (usually the map was in 64KB units but could be larger), thus allowing each memory bus to operate independantly without contention with other memory busses with a resulting much higher thoughput could be achieved. Some motherboards do have parallel memory busses (which tends to require memory chips to be installed in pairs.

@jessepollard7132 Жыл бұрын

YUP. I was Seymour Cray that figured out how to handle multiple processors optimally by using a crossbar switch in the Cray systems produced by Cray Research.

@littlemeg137 2 жыл бұрын

The Paracel GeneMatcher had 6,144 cores. The Connection Machine had 65,536 cores.

@Cyberfoxxy 2 жыл бұрын

Meanwhile a common GPU boasts 8000 cores. Tho they are much slower and has only a small set of instructions. Also the instruction set is not standardized. As such OpenGL/OpenCL is implemented by the vendors themselves.

@coleshores 2 жыл бұрын

Still Turing complete though. There are highly parallel SQL Databases which run entirely on the GPU, such as Omnisci (formerly MapD) for example.

@moneyharry 9 ай бұрын

I am so glad to found your channel

@griffihn 2 жыл бұрын

thank you for this concise and to the point explanation

@johndoh5182 2 жыл бұрын

6:00, Thermal efficiency. This is hard to throw into a conversation about core counts because a CPU can be lower speed or high speed. Then you have constraints of a node being used. These things together mean that Thermal efficiency has little to do with how many cores can go into a CPU, or if we want to be more technical, a die or chiplet. If one says for instance that due to a thermal limit of X, this die can only have 8 cores, that not really a true statement. It's more on the line of, due to the thermal limit of X and running a processor at a speed of Y, on THIS node a core chiplet using AMD's Zen 4 X86-64 core should have no more than Z cores. Every node has different thermal limits, and different characteristic which cause ever faster speeds to cause the die to heat up to the point where thermal limits are the main constraint. You can clock Intel's Intel 7 obviously up to 5.3 - 5.5GHz which is consuming a large amount of power but clearly it's not affecting the efficiency of the core to do it's work. What is happening more is POWER efficiency rather than thermal efficiency. On the other hand, TSMC N7 isn't efficient over 5GHz in any way. Maybe this will change over time. So thermal efficiency is really an edge issue, not a main issue. I could have a die with 30 cores if I run thenm at one speed, and only 8 cores if I run them at another speed when loading all-cores to 100%. So, that's not a BIG constraint and not one I would have led off with. This is a situation that just because someone has put out some data, you have to be careful on how you use that data. It's a neat chart that was shown but only useful for some use cases. There had to have been a lot more data talked about before that chart was shown, or David Henderson from GA Tech is not very sharp. Without talking about all that other data, this point is like my other comment, painting a wrong picture.

@AnarexicSumo 2 жыл бұрын

How pedantic. Firstly, it's an issue. Whether you think it's a fringe or main issue isn't really here or there. Secondly, your comparison to a slower processor with more cores being cooler is intentionally arguing in bad faith. All else equal, a processor with twice the cores will run hotter and require more cooling to run at its best. In fact due to inefficiencies they will run *disproportionately hotter*. As a rule, consumer CPUs with more cores require more cooling.

@johndoh5182 2 жыл бұрын

@@AnarexicSumo So what you're saying then is every time you use a new node, the argument changes. "In fact due to inefficiencies they will run *disproportionately hotter*. As a rule, consumer CPUs with more cores require more cooling." So far these inefficiencies ARE related to clock speed. Every node that every fab makes has a point to where pushing beyond that requires more energy than it's worth for the return amount of work being done by the CPU. AND, this is INDEPENDENT of core count. As a rule more cores requires more cooling when everything else is equal. But that's the point. Everything else is always CHANGING! So there are no HARD rules for core count with regards to THERMAL EFFICIENCY. It depends on everything else. It's a secondary point. NOT primary. THAT is the point. And yes that is arguing in good faith. The points made in the video is arguing in bad faith. To quote "In full transparency some processors these days"........................... and then proceeds to talk as if it was magical that there exists 64 core CPU, which he simply called "double digit", which I find laughable. So yes, thermal efficiency is ONE point, but I could probably put 50 cores of compute power in an Apple iPhone using TSMC N3. I don't NEED to, but because that die is clocked slowly, those tiny cores would be NOTHING at the speed at which they operate. So in that case, thermal efficiency ISN'T a limiting factor for the number of cores that are in the device. And that's why I made the point I did. There's no such thing as a certain number of cores that creates a thermal inefficiency. It depends on too many other factors. Here, points made in good faith for the limit of core count: Memory capacity. Each core needs to have a certain amount of memory space. What that amount of space is, is widely variable because it depends on applications being run. Bandwidth into and out of the CPU. The bandwidth needs to be capable of handling the input or output of data that each core could require. What this amount of bandwidth is, is widely variable because it depends on the applications being run. Capability of the operating system. The OS has to be able to schedule processes (threads) for each core. If there are so many cores that a scheduler cannot direct threads to each core because the scheduler is not fast enough to rotate through all the cores, then this is too many cores for that operating system. But this is widely variable and depends on the applications being run because a thread can be short lived or long lived. I'm trying to think of limitations and the MAIN one that comes to mind is space constraints. This is a REAL constraint, because it doesn't depend on other factors. So, space. AMD is going to be able to release server and WS CPU with Zen 5 that can have 192 cores, or even more. Based on current space, that's what AMD will be able to do with TSMC N3 with either a server MB or a WS MB. And if you're wondering how I get that figure, N3 triples the transistor density over N7. But AMD could be moving to big-little for Zen 5, and AMD might be moving to L3 cache being off-die and being stacked, in which case based on current space constraints, they could probably get up to 256 cores on a SINGLE Zen 5 EPYC CPU. But they'd have to make other changes to the CPU architecture and other architecture to pull that off. PCIe gen5 even with all the lanes that EPYC has probably won't move data fast enough so it would probably need to be using PCIe gen6, which means the rest of the hardware will need to be PCIe gen6. And then DDR5 with 8 memory channels wouldn't be good enough even at the fastest rated speeds. And, with DDR6 supposedly using the same data word length as DDR5, I highly doubt memory bandwidth would allow for that many cores, for many SERVER applications. You'd have to rely on many of the cores already having cached the instructions they need to run so you don't have a couple hundred cores trying to hit memory at the same time. But would "thermal efficiency" be an issue for a 256 core CPU? For a server application using TSMC N3 which uses about 40% less power than N7, where boost clocks are usually in the low 3GHz? No, each core could run very efficiently. Total package power could be exceeded though, and that's not an issue of "efficiency" There isn't a limit because it's not "EFFICIENT" It's a limit because it's too much for that package. I THINK AMD could release a 192 core EPYC CPU, so WAY more than just triple digit, which makes this guy's "double digit" comment a complete JOKE. I THINK that with TSMC N3 and the lower clock speeds of EPYC, AMD can get up to 192 cores with Zen 5 as long as DDR5 has hit much faster speeds (they're at 6400 now) AND you increase memory channels to 12, AND AMD has move to stacking L3 cache and it uses something on the lines of 192MB - 256MB AND the hardware platform is using PCIe gen 6 AND AMD adds 25% more PCIe lanes to the CPU, although maybe the move to PCIe gen6 is good enough to handle the bandwidth needs of that many cores with the existing lanes they have now for EPYC. And I hope that helps to clear up your lack of understanding on this topic. If not we'll agree to disagree.

@AgentSmith911 2 жыл бұрын

I just discover a law that is a lot like More's law, but for cores. It says that eventually, we will be in theory reach so many cores that it doesn't matter if we add more cores and threads. It's called Amdahl's law.

@matsv201 2 жыл бұрын

That law is often missunderstood, its about compute latency, not preformance

@davidolsen1222 2 жыл бұрын

Amdahl's law is about the relationship of different performance based things within a computer. Where if you take some section that takes 90% of the time and hyper-optimize the crap out of it, so it takes 10% that it's previous time, you've managed the amazing feat of speeding up the system 5X and now you need to optimize the other stuff that didn't take much time before. You end up speeding up one part and that's good but then the other parts become wildly more important and you get diminishing returns on those types of optimizations.

@jessepollard7132 Жыл бұрын

already limited to the bottleneck between CPU and RAM.

@bikdigdaddy Жыл бұрын

Hey buddy, it's a request to make a video on cache and registers from the POV of a programmer. Nice videos btw

@Jetx128 2 жыл бұрын

Thanks for the info!! Maybe a video on how operating systems work?

@matsv201 2 жыл бұрын

Intel have made a 1000 core processor... back in 2010... it really wasnt that large, it was a fork of 386 ment to run grapics code...so a x86 gpu.... I turned out to not really work well.. but the processor worked

@zredplayer 2 жыл бұрын

A 1000 core real CPU. Fo you have a proof that Exist?

@ultrapetey 2 жыл бұрын

@@zredplayer en.wikipedia.org/wiki/Larrabee_(microarchitecture)

@Sourcer3r 2 жыл бұрын

Multi hundred cores are already running well, just in another way you might taught first: GPU or more specific GPGPU (general purpose gpu) applications. Just think a moment about ethereum, ai (delf-driving cars), rendering or scientific research (protein folding, space analysis). Of course: your standard operating system will not boot with just a GPU because the instruction set on a gpu compute unit is very limited. This might change in the future: take a look at the Apple m1 or any arm (mobile) chip... They can run more efficient in consumer applications, because they carry less instructions (therefore less transistors and shorter paths (wiring) that generate heat).

@youtubeshadowbannedme 2 жыл бұрын

Just because they run more efficient doesn't mean it'll give good raw performance. The M1 chips excel in both performance and efficiency because of the way Apple designed them to compete with Intel and AMD in the computer market. It's like how Intel was able to make x86 chips that practically was a knockoff of ARM back then, by the name of Atom brand. Only when Intel specifically went out of their way to make an extremely efficient x86 CPU could it happen.

@izakgodsservant 2 жыл бұрын

Very informative, presentation was concise and entertaining.

@spartanwarrior9755 Жыл бұрын

Thank you for demystifying this stuff.

@triularity 2 жыл бұрын

It's more likely the number of "core" will keep increasing, but most of them will be specialized (i.e. not full CPU cores with full system access). Instead, there could be a bunch of core doing something dedicated (but still programmable), such as encryption or compression in a way which they mostly keep to themselves except when being sent input or outputing results.

@mornnb Жыл бұрын

That has trade offs - you have a large number of cores that can only be used for specific tasks that will spend a lot of time idle, where you could be using the transistors for general purpose tasks that can always be used.

@CocoaEm Жыл бұрын

this already is a thing theres a dedicated encryption engine on every modern cpu. some tasks really do need that extra space of the die to be faster.

@DDRWakaLaka Жыл бұрын

Like Cell? Which was trash?

@triularity Жыл бұрын

CPUs already having encryption engines is a start. And some CPUs do include embedded GPUs for video - but better having it by default, even if there is no display support. Nowadays, going a step more optimized and including a few tensor cores would be useful with ML being more common. Maybe even having multi-precision integer math with common functions used in modular math (not just basic add/multiply operations of SIMD),. So newer (or less mainstream) encryption could still benefit and not just be limited to whatever happens to be in the bundled crypto engine. I personally hate it when crypto libraries don't include low level APIs for some standard algorithm.. so when a variant algorithm is needed to support some protocol, it forces developers to practically reinvent the wheel and roll your own from scratch, rather than re-using the existing implementation for most of it - which is just asking for a broken/insecure implementation. So why should it be all-or-nothing for hardware crypto either?

@AlejandroRodolfoMendez 2 жыл бұрын

So far Windows for desktop have a limit of cores that can be used, Linux has not. But it's a thing for considering on the future. Maybe when the limit of core is reached they will make emphasis on number of instruction per cycle.

@clovernacknime6984 2 жыл бұрын

They did, long ago. That's what pipelining, superscalar, out-of-order-executing processors are all about. However, there's limits to how much you can auto-parallelize a single thread, thus they turned to multi-core - which make the programmer parallelize explicitly - out of desperation, since all other avenues for improvement were exhausted. The future is more cores, because we hit the point of diminishing returns for adding more transistors to a single core long ago.

@AlejandroRodolfoMendez 2 жыл бұрын

@@clovernacknime6984 there was attempted seriously since pentium 4 on regular cpu were more on servers and specific cpus. The risc did more but at expense of the operations. Maybe return of cisc too can work.

@Conenion 2 жыл бұрын

@@AlejandroRodolfoMendez Since Pentium Pro around 1995 all Intel CPUs are RISC-like internally. AMD followed. x86 CPUs are CISC from the outside, but internally they use all of the "tricks" that make RISC CPUs so fast.

@Conenion 2 жыл бұрын

@@clovernacknime6984 > out of desperation, since all other avenues for improvement were exhausted. Exactly. Well said.

@AlejandroRodolfoMendez 2 жыл бұрын

@@Conenion they weren't full risc tho. But yes they were doing stuff like that before.

@blazedyoda8608 2 жыл бұрын

Great video man

@frenchmarty7446 Жыл бұрын

For a given die size and transistor count, you have to balance: 1.) More branch prediction and larger cache, things that every program takes advantage of by default. 2.) More/faster I/O and memory bandwidth, which also consumes die space. 3.) More pipelining/superscalar operations. Basically parallelism on a single core that programmers get for free. 4.) More cores/threads, something that programmers have to intentionally design around, has memory overhead (locks), and has diminishing returns for most programs (Amdahl's law).

@christopherleadholm6677 2 жыл бұрын

"My mom- my momma says bad code is for the devil!" - Adam Sandler as Water Boy

@wbtittle 2 жыл бұрын

Once upon a time, I was an entry level engineer for Bettis Atomic Labs. They gave us a tour of the facilities. As we were wandering the warehouse, our guide pointed to the 32,000 processor computer they were planning on using to design Atomic Reactors (I made that part up, they were just planning on tryign to figure out how to use a 32,000 processor machine). They were trying to work out how to program such a machine. Then we moved down the warehouse 20 ft. "This is our 128,000 processor machine" "Why did you buy a 128,000 processor machine before you figured out how to code the 32,000 processor machine". "Because it is bigger and better!" The hurdle of making a 32,000 processor machine work is much much bigger than making a 128,000 processor machine work after you figure out 32000 processors work.

@cloe811 Жыл бұрын

Where did you get all this information from for your research? Awesome video

@TomJones-be5ny 2 жыл бұрын

This was very interesting, thanks

@HuntingKingYT 2 жыл бұрын

"Any computer in the last 10 years" - My pc, Dual-Core i3-2120, 10y/o

@saricubra2867 2 жыл бұрын

My 4 core 8 thread i7-4700MQ made in 2013 looks like a last gen Threadripper in comparison.

@youtubeshadowbannedme 2 жыл бұрын

@@saricubra2867 i7 4700MQ isn't as fast as you think it is, and it definitley cannot compare to i7 4790K. your i7 chip is around the level of i7 2600K at best, but realistically it's probably closer to i5 2500K. this is of course assuming you didn't win the silicon lottery by a big margin. you would need at least i7 7700HQ to match the i7 4790K at the latter's performance at base speed.

@saricubra2867 2 жыл бұрын

@@youtubeshadowbannedme My i7 outperforms that i5. And yes, it's between the 2600K, or i7-3770K I never said that it's equivalent to the 4790K.

@saricubra2867 2 жыл бұрын

@@youtubeshadowbannedme 2500K lacks hyperthreading lmao.

@saricubra2867 2 жыл бұрын

@@youtubeshadowbannedme I tested a family member's laptop with the i7-7700HQ and yes, it's kinda a 4790K at stock. On average, laptop CPUs are two years behind equivalent high end i7 from desktop, that changed with 11th gen core generation and 12th too, the gap is smaller. For example, the i7-11800H without throttling outperforms the 10700K that was launched before by one year.

@kimobrien. 2 жыл бұрын

You can't have unlimited numbers of transistors because eventually you get down to the atomic level. The same with clock speed eventually the distance traveled across a processor from one side to the other is a quarter wavelength of the clock speed. Than the distance the signal travels becomes important. The size of a chip is also limited to that of about the size of a fingernail.

@vadimuha Жыл бұрын

There's subatomic level. It's great at parallel computation

@Four-S 2 жыл бұрын

Bruh why don't you have at least 100k subscribers, this is a great video

@bryanparks6958 2 жыл бұрын

Nice. Succinct. And easy to understand.

@kyleeames8229 2 жыл бұрын

I'm just gonna guess before I see your explanation. Firstly, there are actually relatively few computational problems that can be more efficiently solved with lots of parallelization. Secondly, once core counts go above a certain limit, your chip either has to be really big, or you need an unreasonably large cooling system to keep it from melting a hole in your floor. Ok, I'll see if I'm right!

@paklekj4429 2 жыл бұрын

Had to refill the liquid nitrogen every 30min lol

@thelazarous 2 жыл бұрын

Well the temperature thing has already been kinda debunked. The original Pentium D is a perfect example; 2 cores, 2x the thermal load. But that's not really a problem with modern dual, quad, or even octuple cores. Today 32 cores requires 250w, in 20 years it'll take 25-50w. 20 years ago 8 full cores on a single package was considered stupid as nothing would ever even use them and if they did they'd melt, now I have 8 full cores in my laptop and they spend plenty of time at 100% usage.

@harvey66616 2 жыл бұрын

_"there are actually relatively few computational problems that can be more efficiently solved with lots of parallelization"_ -- uh, what? The class of problems suitable to SIMD architecture is quite large. It's been a significant chunk of research for decades. Modern graphics cards exist, and are in short supply, _because_ there are so many useful applications for that architecture, not just gaming. Indeed, the neural network machine learning space alone has myriad applications. And that's just one sub-genre of the larger picture.

@MindCaged 2 жыл бұрын

I still remember having those single-core processors for years and the really annoying problem where the computer would freeze because whatever program was running got stuck in an intensive processing loop or even just an infinite loop and was basically hogging the single-core to itself not letting anything else run. It was such a relief even when I got my first dual core, and I was wondering where this had been for so many years. Now I have a quad-core and to be honest I have to have a lot of programs running at once to fully utilize it, or maybe I have to find some program that can actually utilize all the cores at once, which isn't that many. Also, even if I could find one, it'd probably hit a different bottleneck in either memory access or file access speed.

@1over137 Жыл бұрын

I know you are simplifying but multiple parallel executions have been possible in single cores for a lot longer than we have had multi-cores. There are many CPU tasks which take many clock cycles. Some of those tasks can be executed in parallel with other instructions. Instruction pipelining, speculative execution etc, all work in single cores resulting in an IPC (instructions per clock) greater than 1. As to whether a hardware context switch could occur within the pipelining ... my understanding is that, "hyper threading" is a relatively recent thing, but it exists.

@Ferrari255GTO 2 жыл бұрын

The sweet spot for most consumers is 8 cores imo, most games don't need more, and asuming your CPU is fairly modern it will be perfectly capable of doing whatever you require it to without issues. It won't be an oven, but it will still need some decent cooling and since it's an 8 core it won't be top of the line, making it cheaper than other CPUs while delivering a really good experience. What i mean is don't just get the biggest thing you can, it might not be as convenient as you think

@lawrencedoliveiro9104 2 жыл бұрын

According to the top500 list, the current fastest supercomputer in the world, RIKEN’s Fugaku, has 7,630,848 cores. Of course, they’re not x86 cores, they’re ARM. And it’s not running Windows, it’s Linux. That might help.

@mikapeltokorpi7671 2 жыл бұрын

Not in single silicon, though.

@lawrencedoliveiro9104 2 жыл бұрын

@@mikapeltokorpi7671 Not sure why that’s relevant.

@Conenion 2 жыл бұрын

> That might help. Minor. What /really/ helps is that these HPC machines were built with special purposes in mind. These machines typically run algorithms that scale very well. Like for example solving systems of linear equations. Number crunching stuff.

@lawrencedoliveiro9104 2 жыл бұрын

@@Conenion The problems scale, up to a point. That’s why a supercomputer needs a high-performance interconnect which makes up such a big part of its cost. If it wasn't for that, a supercomputer would not be much different from, say, a server farm.

@Conenion 2 жыл бұрын

@@lawrencedoliveiro9104 True. They need a high-performance interconnect because Amdahl's law would kick in much earlier without.

@mryodak 2 жыл бұрын

LLL: "Computers Can't Have Thousands of Cores" GPUs: Am I a joke to you?

@hjups 2 жыл бұрын

GPUs technically don't have thousands of cores either. The Titan V only has 80 (the SM is the equivalent to a CPU core, not a "CUDA Core").

@mryodak 2 жыл бұрын

@@hjups SM(Stream Multiprocessor) are just collections of CUDA cores as far as I know. And Radeon calls their stuff Stream processor and they also have thousands of them.

@hjups 2 жыл бұрын

@@mryodak That's correct. But they are not "cores", they are ALUs. Put it this way.... you can either claim that the Titan V has 5120 cores and the 5900x has 816, or you can claim that the Titan V has 80 cores and the 5900x has 12.

@Conenion 2 жыл бұрын

GPUs don't have cores. That is simply wrong. They have very small computing units, but many. The entire GPU architecture is targeted towards making a single thing fast, i.e. the graphics pipeline. It can be used for some special number crunching stuff (GPGPU) but that is not what the people who designed GPUs had in focus. When programming for a GPGPU you use a very special style of programming and you have to do a lot of things "by hand".

@mryodak 2 жыл бұрын

@@Conenion Cuda is c++, opengl is c++, vulkan is c. Other then being parallel and having it's own instruction set, what's the difference?

@boblake2340 Жыл бұрын

Excellent presentation!

@rogerlong1845 Жыл бұрын

Really good explanation of this…

@occapella8643 2 жыл бұрын

At its most basic level, a CPU is just a rock that we trapped lighting inside of and tricked into thinking.

@xCwieCHRISx 2 жыл бұрын

If the apocalypse comes those magical stones are very valuable.

@Kevin-jb2pv Жыл бұрын

"Can Intel make a processor with 1,000 or more cores?" Yeah. They're called GPU's. I know a GPU is different as far as what it's designed for, but fundamentally it's the same concept just optimized for different tasks. I'm pretty sure that if you had the time, skills, and desire, you could take a GPU (the chip, not necessarily the whole card) and design a Turing complete computer around it functioning as the CPU. It would suck and be super limited and totally not worth the effort, but it would technically still be a computer.