I have a computer architecture exam late this morning, wake up extra early to go to the hospital for a visit, im watching this video while im waiting🙌🏻
@NERDfirst Жыл бұрын
Hello and thank you for your comment! Do take care and all the best for your exam =)
@Ans3lm07772 жыл бұрын
The explanation in your videos are so crisp. Really appreciate the quality of these - keep it up :)
@NERDfirst2 жыл бұрын
Hello and thank you very much for your comment! Glad you liked the video =)
@KumaAdventure Жыл бұрын
Thank you, this helped clarify some things I came across for the Comptia A+ exam. Much appreciated.
@NERDfirst Жыл бұрын
You're welcome! Very happy to be of help :)
@akioasakura3624 Жыл бұрын
THANK YOU SIR!! I made many minecraft CPUs when i was 13. back then there werent many videos or resources that didn't explain pipelining in terms of "car assembly lines" or "laundry", or 4000 page university PDFs from the 90s. Thank you so much good sir.
@NERDfirst Жыл бұрын
You're welcome! Very happy to be of help =) I think those are fairly textbook explanations so it's no wonder you see them a lot. Analogies are good too I suppose, but I guess nothing beats visualizing it properly!
@akioasakura3624 Жыл бұрын
@@NERDfirst i struggled with this for so long. but thanks to u maybe i can try playing minecraft again. have a good day!!
@NERDfirst Жыл бұрын
Good luck! Consider planning out your design first using actual logic components before doing it in game. Redstone is a whole different level of complexity!
@akioasakura3624 Жыл бұрын
@@NERDfirst ohh alright, thanks!!
@123jimenez992 жыл бұрын
Amazing video, it really made my understand why the PPE cores used both in CELL and Xenon where so underwhelming, it really suffered from all the bad stuff mentioned in this video: long pipelines, lots of stalls, lack of out of order execution and more. Also it made me realize how important was relying on the SPEs as much as possible in CELL's case, witch BTW was a big PITA. Cool Stuff.
@NERDfirst2 жыл бұрын
Oh wow, this is a great case study, thank you for sharing! Its pipeline is 23 stages! Really interesting to read about.
@123jimenez992 жыл бұрын
@@NERDfirst Prescott P4: Hold my beer!
@NERDfirst2 жыл бұрын
At least that's x86 - a CISC instruction set so it's less out of place!
@mill4340 Жыл бұрын
I completely forget all of this having studied for Comp Arch class. Your video refreshes the introduction I needed. Thank you.
@NERDfirst Жыл бұрын
You're welcome! Glad to be of help =)
@juanmanuelserna769210 ай бұрын
Great quality video, easy to understand for people who does not come from computer science world, great job!
@NERDfirst10 ай бұрын
Hello and thank you very much for your comment! Glad you liked the video :)
@akkudakkupl Жыл бұрын
That's not the only reason for pipelining. You could do a CPU that does the whole instruction in one clock (one rising, one falling edge). But you still have propagation time that limits max clock speed (and computation speed), pipelining allows to break up propagation into smaller chunks and to elevate clock speeds.
@NERDfirst Жыл бұрын
Hello and thank you for your comment! To be fair, increasing clock speed this way isn't going to increase the overall speed of computation - No point getting your clock speeds up to 20GHz if every instruction has to make its way through 100 pipeline stages! Ultimately it's less about managing propagation delay - In fact having multiple pipeline stages _increases_ the total per-instruction propagation delay since it makes the circuitry more complex. The advantage comes about from the "parallelism" where we essentially start on the next instruction before the last one is complete.
@akkudakkupl Жыл бұрын
@@NERDfirst let's say you have an ALU that has 100ns propagation. Now you split that up into two 50ns steps with some latches in between. You just almost doubled your instructions per second due to doubling the clock rate. This is pipelining and it's most important reason. What you are referencing is superscalarity and out of order execution - the use of multiple execution units to their full extent.
@NERDfirst Жыл бұрын
I think we're talking about the same things using different words, or maybe I just wasn't explicit enough on the point. My way of explaining it (at 3:32) assumes that pipeline stages exist but instructions are processed to completion before the next instruction enters the pipeline. Your way of explaining it does away with the pipeline model and considers the execution of an instruction as a single large step. I didn't explicitly mention propagation delay by name to reduce on cognitive load, but I do believe the understanding conveyed is the same. If I understand your explanation correctly, you get a doubling of instructions per second _because_ of instruction-level parallelism. At the end of the day, if you double the clock speed but each instruction takes two clock cycles to complete, the number of instructions per second is exactly the same. It is because of superscalarity allowing you to have multiple instructions in the ALU at once that you can have a performance benefit. Do let me know if I'm understanding you wrongly. It's been a while since I did this stuff.
@akkudakkupl Жыл бұрын
@@NERDfirst In my example my single ALU can be in two discrete steps of executing two instructions - first half of a new instruction and second half of an older instruction. You can imagine my pipeline like this (a modification of the classic RISC pipeline): Fetch Decode Execution 1 Execution 2 Memory Write Back I have divided the execution stage in two. This is because my hypothetical ALU would have 100 ns of propagation and would limit the clock to 10 MHz. By splitting it up I now have a little longer pipeline , but my largest propagation went down to lets say 55 ns (because we had to add latches in between stages its not ideally half). Now my CPU can run at 18 MHz. Both of those frequencies roughly translate to instructions per second because in both cases the instructions complete "in a single cycle" due to pipelining. This is the advantage of longer pipelines - as long as you get an uninterrupted stream of instructions you can get a boost in IPS because you have higher max clock. This is of course not ideal because you have branches in the code and that stalls or flushes the pipeline. You are executing multiple instructions at a time because result of one step is transferred further on to be computed in the next - basicaly it's an improvement over very old CPUs that executed those steps one after another because pipelining needs additional circuitry, so you got one instruction in for example 4 clocks. But you can't compute more instructions at a time than you have pipeline stages. For that you need superscalarity - having multiple ALUs, multiple address generation units, etc. working at the same time - and to make it work right you also use out of order execution, so you can fill up those elements pipelines (yes, everything is pipelined in a modern CPU). What I was implying earlier was that a Harvard architecture CPU could execute a full instruction in a single clock - because both instruction and data are supplied at the same time - but it might not run at a very fast clock because data has to propagate through the whole datapath in that one clock cycle.
@jefferybarnett1849 Жыл бұрын
Thanks for enlightening me about heuristics. I loved the graphical representation of the "shifts" in your presentation on pipelines and "stalls" that happen and avoiding them along the way. I knew just a moment before you showed us that the instructions were about to be reordered. My understanding has been improved. My knowledge of assembly language helped a lot, I just never bothered to look into the matter as you have done. Thanks a lot.
@NERDfirst Жыл бұрын
Hello and thank you very much for your comment! Glad you enjoyed the video, and really appreciate you sharing your "aha" moment - That's one of the things I live for as an educator =)
@ArneChristianRosenfeldt Жыл бұрын
Heuristics makes me want to see a CPU (simulation) where the scalar CPU splits up into two threads at every branch (becomes super scalar). Store commands write into a FIFO! Then when the branch condition is clear, a whole tree of threads is flushed. The Store FIFO of the taken branch is flushed to memory. This might be a useful operation mode for those 16 core RISCV chips.
@Atharv08122 жыл бұрын
Your content is so professional. Can you also make videos on modern microprocessor architecture like i3 ,i5 ,i7 etc.
@NERDfirst2 жыл бұрын
Hello and thank you for your comment! Unfortunately those architectures are far more complex (some modern architectures have twenty or more pipeline stages) so I haven't gotten round to learning about them.
@awayfrom907 ай бұрын
Superb explanation 🎉
@NERDfirst7 ай бұрын
Hello and thank you very much for your comment! Very happy to be of help :)
@LegonTW0 Жыл бұрын
gracias capo, clarito como un vasito, te quiero
@NERDfirst Жыл бұрын
Hello and thank you for your comment! Glad to be of help =)
@AshtonvanNiekerk Жыл бұрын
Very well explained.
@NERDfirst Жыл бұрын
Hello and thank you for your comment! Very happy to be of help =)
@itznukeey Жыл бұрын
Great explanation, thanks
@NERDfirst Жыл бұрын
You're welcome! Glad to be of help =)
@cheenoong9228 Жыл бұрын
why do i see in some materials regarding the order of the process is IF ( Instruction Fetch ) --> ID ( Instruction Decode ) -> EX( Instruction Execute ) -> MEM( Access Memory Operand ) -> WB ( Write Back )
@NERDfirst Жыл бұрын
Hello and thank you for your comment! If I'm not wrong, what you've described is specifically the MIPS pipeline. Different architectures can have a different number and order of pipeline stages, so this isn't universal. What I've shown in the video isn't linked to any specific assembly architecture, it's just a generic abstract pipeline to make understanding things easier.
@dimnai2 жыл бұрын
Great video, well done!
@NERDfirst2 жыл бұрын
Hello and thank you very much for your comment! Glad you liked the video :)
@DReam-mn4mj10 ай бұрын
Great video, keep it up!
@NERDfirst10 ай бұрын
Hello and thank you very much for your comment! Glad you liked the video :)
@fraewn26172 жыл бұрын
well put
@NERDfirst2 жыл бұрын
Thank you very much! Glad you liked the video :)
@Epic-so3ek2 жыл бұрын
these videos are really good
@NERDfirst2 жыл бұрын
Hello and thank you very much for your comment! Glad you liked the video =)
@robot677992 жыл бұрын
Great content 👍
@NERDfirst2 жыл бұрын
Hello and thank you very much for your comment! Glad you liked the video =)
@Brekstahkid4 ай бұрын
Good stuff
@NERDfirst4 ай бұрын
Thank you! Glad you liked the video :)
@memeingthroughenglish72214 ай бұрын
Damn, your videos are so nice!!!
@NERDfirst4 ай бұрын
Thank you very much! I remember your comment on another one of my videos as well, glad to know you like my work =)
@JedJarin25 күн бұрын
thank you
@NERDfirst25 күн бұрын
You're welcome! Glad to be of help :)
@adamchalkley956 Жыл бұрын
I have a question, not all instruments have a write back, i.e. not written the results back to registers, memory, etc. for example on the 8080, jmp instructions do not write back to anywhere. Another example would be a MOV instruction, that moves data from memory/registers to registers/memory. So what happens when an instruction has no write back? Does it execute a noop? Again I’m still quite the novice, thanks
@NERDfirst Жыл бұрын
Hello and thank you for your comment! Yes, instructions that don't require any action to be taken on any stage would still have to go through the stage, but will do nothing there.
@adamchalkley956 Жыл бұрын
@@NERDfirst Thanks, that makes sense
@cyprienvilleret22662 жыл бұрын
great thanks
@NERDfirst2 жыл бұрын
You're welcome! Glad to be of help :)
@bahrikeskin5824 Жыл бұрын
could you change he song please my brain is burning because of this :( but i understand the consept thanks :) like
@NERDfirst Жыл бұрын
Oh sorry about that! I compared levels with popular KZbinrs and realized my BGM was turned down much lower than them. I'd hoped for it to be out of the way but looks like you still picked up on it. I'll see what I can do for future videos!