CPU Pipelining - The cool way your CPU avoids idle time!

Рет қаралды 12,417

Күн бұрын

Пікірлер: 66

@Eujinv Жыл бұрын

I have a computer architecture exam late this morning, wake up extra early to go to the hospital for a visit, im watching this video while im waiting🙌🏻

@NERDfirst Жыл бұрын

Hello and thank you for your comment! Do take care and all the best for your exam =)

@Ans3lm0777 2 жыл бұрын

The explanation in your videos are so crisp. Really appreciate the quality of these - keep it up :)

@NERDfirst 2 жыл бұрын

Hello and thank you very much for your comment! Glad you liked the video =)

@123jimenez99 2 жыл бұрын

Amazing video, it really made my understand why the PPE cores used both in CELL and Xenon where so underwhelming, it really suffered from all the bad stuff mentioned in this video: long pipelines, lots of stalls, lack of out of order execution and more. Also it made me realize how important was relying on the SPEs as much as possible in CELL's case, witch BTW was a big PITA. Cool Stuff.

@NERDfirst 2 жыл бұрын

Oh wow, this is a great case study, thank you for sharing! Its pipeline is 23 stages! Really interesting to read about.

@123jimenez99 2 жыл бұрын

@@NERDfirst Prescott P4: Hold my beer!

@NERDfirst 2 жыл бұрын

At least that's x86 - a CISC instruction set so it's less out of place!

@KumaAdventure Жыл бұрын

Thank you, this helped clarify some things I came across for the Comptia A+ exam. Much appreciated.

@NERDfirst Жыл бұрын

You're welcome! Very happy to be of help :)

@akioasakura3624 Жыл бұрын

THANK YOU SIR!! I made many minecraft CPUs when i was 13. back then there werent many videos or resources that didn't explain pipelining in terms of "car assembly lines" or "laundry", or 4000 page university PDFs from the 90s. Thank you so much good sir.

@NERDfirst Жыл бұрын

You're welcome! Very happy to be of help =) I think those are fairly textbook explanations so it's no wonder you see them a lot. Analogies are good too I suppose, but I guess nothing beats visualizing it properly!

@akioasakura3624 Жыл бұрын

@@NERDfirst i struggled with this for so long. but thanks to u maybe i can try playing minecraft again. have a good day!!

@NERDfirst Жыл бұрын

Good luck! Consider planning out your design first using actual logic components before doing it in game. Redstone is a whole different level of complexity!

@akioasakura3624 Жыл бұрын

@@NERDfirst ohh alright, thanks!!

@mill4340 Жыл бұрын

I completely forget all of this having studied for Comp Arch class. Your video refreshes the introduction I needed. Thank you.

@NERDfirst Жыл бұрын

You're welcome! Glad to be of help =)

@juanmanuelserna7692 Жыл бұрын

Great quality video, easy to understand for people who does not come from computer science world, great job!

@NERDfirst Жыл бұрын

Hello and thank you very much for your comment! Glad you liked the video :)

@akkudakkupl Жыл бұрын

That's not the only reason for pipelining. You could do a CPU that does the whole instruction in one clock (one rising, one falling edge). But you still have propagation time that limits max clock speed (and computation speed), pipelining allows to break up propagation into smaller chunks and to elevate clock speeds.

@NERDfirst Жыл бұрын

Hello and thank you for your comment! To be fair, increasing clock speed this way isn't going to increase the overall speed of computation - No point getting your clock speeds up to 20GHz if every instruction has to make its way through 100 pipeline stages! Ultimately it's less about managing propagation delay - In fact having multiple pipeline stages _increases_ the total per-instruction propagation delay since it makes the circuitry more complex. The advantage comes about from the "parallelism" where we essentially start on the next instruction before the last one is complete.

@akkudakkupl Жыл бұрын

@@NERDfirst let's say you have an ALU that has 100ns propagation. Now you split that up into two 50ns steps with some latches in between. You just almost doubled your instructions per second due to doubling the clock rate. This is pipelining and it's most important reason. What you are referencing is superscalarity and out of order execution - the use of multiple execution units to their full extent.

@NERDfirst Жыл бұрын

I think we're talking about the same things using different words, or maybe I just wasn't explicit enough on the point. My way of explaining it (at 3:32) assumes that pipeline stages exist but instructions are processed to completion before the next instruction enters the pipeline. Your way of explaining it does away with the pipeline model and considers the execution of an instruction as a single large step. I didn't explicitly mention propagation delay by name to reduce on cognitive load, but I do believe the understanding conveyed is the same. If I understand your explanation correctly, you get a doubling of instructions per second _because_ of instruction-level parallelism. At the end of the day, if you double the clock speed but each instruction takes two clock cycles to complete, the number of instructions per second is exactly the same. It is because of superscalarity allowing you to have multiple instructions in the ALU at once that you can have a performance benefit. Do let me know if I'm understanding you wrongly. It's been a while since I did this stuff.

@akkudakkupl Жыл бұрын

@@NERDfirst In my example my single ALU can be in two discrete steps of executing two instructions - first half of a new instruction and second half of an older instruction. You can imagine my pipeline like this (a modification of the classic RISC pipeline): Fetch Decode Execution 1 Execution 2 Memory Write Back I have divided the execution stage in two. This is because my hypothetical ALU would have 100 ns of propagation and would limit the clock to 10 MHz. By splitting it up I now have a little longer pipeline , but my largest propagation went down to lets say 55 ns (because we had to add latches in between stages its not ideally half). Now my CPU can run at 18 MHz. Both of those frequencies roughly translate to instructions per second because in both cases the instructions complete "in a single cycle" due to pipelining. This is the advantage of longer pipelines - as long as you get an uninterrupted stream of instructions you can get a boost in IPS because you have higher max clock. This is of course not ideal because you have branches in the code and that stalls or flushes the pipeline. You are executing multiple instructions at a time because result of one step is transferred further on to be computed in the next - basicaly it's an improvement over very old CPUs that executed those steps one after another because pipelining needs additional circuitry, so you got one instruction in for example 4 clocks. But you can't compute more instructions at a time than you have pipeline stages. For that you need superscalarity - having multiple ALUs, multiple address generation units, etc. working at the same time - and to make it work right you also use out of order execution, so you can fill up those elements pipelines (yes, everything is pipelined in a modern CPU). What I was implying earlier was that a Harvard architecture CPU could execute a full instruction in a single clock - because both instruction and data are supplied at the same time - but it might not run at a very fast clock because data has to propagate through the whole datapath in that one clock cycle.

@jefferybarnett1849 2 жыл бұрын

Thanks for enlightening me about heuristics. I loved the graphical representation of the "shifts" in your presentation on pipelines and "stalls" that happen and avoiding them along the way. I knew just a moment before you showed us that the instructions were about to be reordered. My understanding has been improved. My knowledge of assembly language helped a lot, I just never bothered to look into the matter as you have done. Thanks a lot.

@NERDfirst 2 жыл бұрын

Hello and thank you very much for your comment! Glad you enjoyed the video, and really appreciate you sharing your "aha" moment - That's one of the things I live for as an educator =)

@ArneChristianRosenfeldt Жыл бұрын

Heuristics makes me want to see a CPU (simulation) where the scalar CPU splits up into two threads at every branch (becomes super scalar). Store commands write into a FIFO! Then when the branch condition is clear, a whole tree of threads is flushed. The Store FIFO of the taken branch is flushed to memory. This might be a useful operation mode for those 16 core RISCV chips.

@Atharv0812 2 жыл бұрын

Your content is so professional. Can you also make videos on modern microprocessor architecture like i3 ,i5 ,i7 etc.

@NERDfirst 2 жыл бұрын

Hello and thank you for your comment! Unfortunately those architectures are far more complex (some modern architectures have twenty or more pipeline stages) so I haven't gotten round to learning about them.

@LegonTW0 Жыл бұрын

gracias capo, clarito como un vasito, te quiero

@NERDfirst Жыл бұрын

Hello and thank you for your comment! Glad to be of help =)

@AshtonvanNiekerk Жыл бұрын

Very well explained.

@NERDfirst Жыл бұрын

Hello and thank you for your comment! Very happy to be of help =)

@cheenoong9228 Жыл бұрын

why do i see in some materials regarding the order of the process is IF ( Instruction Fetch ) --> ID ( Instruction Decode ) -> EX( Instruction Execute ) -> MEM( Access Memory Operand ) -> WB ( Write Back )

@NERDfirst Жыл бұрын

Hello and thank you for your comment! If I'm not wrong, what you've described is specifically the MIPS pipeline. Different architectures can have a different number and order of pipeline stages, so this isn't universal. What I've shown in the video isn't linked to any specific assembly architecture, it's just a generic abstract pipeline to make understanding things easier.

@ArneChristianRosenfeldt 17 күн бұрын

I think that MIPS tries to speed-up write back. When every value flows through the pipeline for 5 cycles, we can turn off power for that register for this time. Leakage should bring it to a middle state between on and off. Then we write back, which is still a little power hungry due to the fan-out, and then turn on power to let the bits flip into their intended states.

@awayfrom90 9 ай бұрын

Superb explanation 🎉

@NERDfirst 9 ай бұрын

Hello and thank you very much for your comment! Very happy to be of help :)

@galdali10 23 күн бұрын

Great video!!!

@NERDfirst 23 күн бұрын

Hello and thank you very much for your comment! Glad you liked the video :)

@itznukeey Жыл бұрын

Great explanation, thanks

@NERDfirst Жыл бұрын

You're welcome! Glad to be of help =)

@dimnai 2 жыл бұрын

Great video, well done!

@NERDfirst 2 жыл бұрын

Hello and thank you very much for your comment! Glad you liked the video :)

@DReam-mn4mj 11 ай бұрын

Great video, keep it up!

@NERDfirst 11 ай бұрын

Hello and thank you very much for your comment! Glad you liked the video :)

@memeingthroughenglish7221 5 ай бұрын

Damn, your videos are so nice!!!

@NERDfirst 5 ай бұрын

Thank you very much! I remember your comment on another one of my videos as well, glad to know you like my work =)

@fraewn2617 2 жыл бұрын

well put

@NERDfirst 2 жыл бұрын

Thank you very much! Glad you liked the video :)

@Epic-so3ek 2 жыл бұрын

these videos are really good

@NERDfirst 2 жыл бұрын

Hello and thank you very much for your comment! Glad you liked the video =)

@robot67799 2 жыл бұрын

Great content 👍

@NERDfirst 2 жыл бұрын

Hello and thank you very much for your comment! Glad you liked the video =)

@JedJarin 2 ай бұрын

thank you

@NERDfirst 2 ай бұрын

You're welcome! Glad to be of help :)

@cyprienvilleret2266 2 жыл бұрын

great thanks

@NERDfirst 2 жыл бұрын

You're welcome! Glad to be of help :)

@Brekstahkid 6 ай бұрын

Good stuff

@NERDfirst 6 ай бұрын

Thank you! Glad you liked the video :)

@adamchalkley956 Жыл бұрын

I have a question, not all instruments have a write back, i.e. not written the results back to registers, memory, etc. for example on the 8080, jmp instructions do not write back to anywhere. Another example would be a MOV instruction, that moves data from memory/registers to registers/memory. So what happens when an instruction has no write back? Does it execute a noop? Again I’m still quite the novice, thanks

@NERDfirst Жыл бұрын

Hello and thank you for your comment! Yes, instructions that don't require any action to be taken on any stage would still have to go through the stage, but will do nothing there.

@adamchalkley956 Жыл бұрын

@@NERDfirst Thanks, that makes sense

@bahrikeskin5824 Жыл бұрын

could you change he song please my brain is burning because of this :( but i understand the consept thanks :) like

@NERDfirst Жыл бұрын

Oh sorry about that! I compared levels with popular KZbinrs and realized my BGM was turned down much lower than them. I'd hoped for it to be out of the way but looks like you still picked up on it. I'll see what I can do for future videos!

@bahrikeskin5824 Жыл бұрын

@@NERDfirst thanks