5-Stage Pipeline Processor Execution Example

Рет қаралды 192,148

Matthew Watkins

Күн бұрын

Пікірлер: 66

@richardhall9815 5 жыл бұрын

I didn't know Tom Hanks made videos about instruction pipelining in his free time!

@zackjohnson9387 4 жыл бұрын

tony stark*

@glykeriatheodorou7586 4 жыл бұрын

hahahaha actually their voices are very similar and I just noticed it 😂😂😂

@jilha1122 3 жыл бұрын

Wow now that you mention it...

@Spinogrl2000 2 жыл бұрын

Thank you so much! KZbin has taught me more about pipelines and data paths in an hour than my prof. has this whole month. They make it so much harder than it has to be!! Again, thank you!

@ad.i 10 ай бұрын

I know it's been a whole year, but do you recall any specific videos that helped explain the topic that you think are definitely worth looking at? If not, that's all good but thanks a lot!

@darianxd5508 7 ай бұрын

@@ad.i david and sarah harris

@MrICH99999 10 ай бұрын

Still helping me in 2024 - big thanks!

@apex1608 6 ай бұрын

true

@Vwcz 6 жыл бұрын

Great video, and active in comments section. Excellent content creator! This is what we need. Thanks

@kevinle4766 6 жыл бұрын

I am a bit confused as to how the iteration is from 5 to 18?

@MrXinchuan 7 жыл бұрын

I don't understand why you stall in the first beq (second instruction), but you don't stall lw (fourth instruction) and let forwarding take care of it. Because the previous instruction blt and add, both have the result ready after the execute stage

@matthewwatkins88 7 жыл бұрын

In the processor this is dealing with, branches are resolved in the decode stage. In this case that means that the value of $t0 is needed in the decode stage. Since the instruction before the branch (the add) writes to $t0, the value needs to come from the slt instruction and the result of the slt isn't available until the end of the EX stage so the first beq has to stall a cycle so that it can get the correct value from EX forwarded into the ID stage. The lw doesn't need to stall because it doesn't need the value of $t0 until the EX stage (where the branch needed it in the ID stage). In this case, the add instruction has completed the EX stage before the lw enters the EX stage and so no stalling is needed (it is just directly forwarded).

@shubhrajit2117 5 күн бұрын

Yup operand forwarding happens between ALU, so there should be no need of stall

@ahmadahm3513 3 жыл бұрын

in case of forwarding beq schuld not wait for data cause it can get the data passed throw in the same cycle, unless you are assuming that EX stage for slt and ID stage for beq they are not happening in the same cycle:)

@codewithven9391 2 ай бұрын

This isn't true. In the SLT instruction, the arithmetic is performed at the beginning of cycle 3 therefore it can't be forwarded until the end of cycle 3. So BEQ will not have the correct value in cycle 3 but in cycle 4 it can after the result has been forwarded from SLT - E/M

@Simppi96 7 жыл бұрын

This is a really great video, thanks! But I am still not sure on how data dependencies work. How do you know when a command has the data ready for another to use? For example, the BEQ command needs $t0 and it can get it after the SLT command has executed, but the next BEQ command has to wait for the LW command to get to the memory clock cycle. I would be very grateful for an answer, thanks in advance!

@matthewwatkins88 7 жыл бұрын

This partially depends on the implementation of the pipelined processor. For this example it is assumed that for all instructions that produce data, except for load instructions, the data is available as the instruction moves from the execute (E) to the memory (M) stage. This means that for the slt/beq combination. The SLT produces the data in execute and so that data can be forwarded from the beginning of the memory stage to the decode (D) stage (where it is needed for branches). For load instructions, the data is not available until after the instruction accesses the data memory, which means it is only available as the instruction moves from memory (M) to writeback (W). This is why the lw/beq combination has to wait another cycle as it is only as the lw moves into writeback that the data is available to forward to decode.

@rekr6381 6 жыл бұрын

@@matthewwatkins88 Thanks for this response, very helpful!!

@owenzhang7503 8 ай бұрын

If we dont have the last line, what the pipeline will be? Can we begin the IF of the first loop line directly in circle 14?

@matthewwatkins88 8 ай бұрын

The last line, as I interpret it anyway, is never executed, so removing it really wouldn't change anything.

@owenzhang7503 8 ай бұрын

@@matthewwatkins88 I see. Thank you very much!

@albaraam1873 Жыл бұрын

I think in the third case you meant first branch (beq $t0,$0, end) is taken only

@matthewwatkins88 Жыл бұрын

I'll stop my head, I would agree with you.

@selvalooks 5 жыл бұрын

this is wonderful , pipeline fantastic explanation !!!!

@_nognom 6 жыл бұрын

The value for $t0 from the SLT instruction should be ready to forward at the later half of stage E, which is right before the early half of stage E for the BEQ instruction, which suggests that value for $t0 will be forwarded to the ALU instead of requiring a stall. Is this not correct?

@matthewwatkins88 6 жыл бұрын

No, this is not correct. The result of the SLT (or any instruction computed in the execute stage) is only ready at the end of the cycle and so can really only be forwarded at the beginning of the next stage (the memory stage). Additionally, the BEQ needs the value for $t0 in the decode stage since it resolves the branch in this stage. This means the branch can not properly complete the decode stage until the previous SLT has completed the execute stage. *If* the branch was resolved in the execute stage (which is not the case here), then a stall would not be necessary as forwarding would take care of the dependency.

@trumpetperson11 4 жыл бұрын

@@matthewwatkins88 I had a similar though. It seems that I have been told that you can forward the data directly to the ALU (or more precisely, the register in between the D and E stages) for the calculation (overwriting the data received from the register in the D stage). This would give you the time to not require a stall there. Is this just not correct?

@ahmadahm3513 3 жыл бұрын

@@matthewwatkins88 that not realy true cause the result of each stage could be ready in the first half of the cycle like the WB stage

@wendyli6238 6 жыл бұрын

Are we using forwarding in this problem? I'm confused on when the next instruction should start if we are using forwarding

@matthewwatkins88 6 жыл бұрын

The example definitely assumes forwarding. I'm not 100% sure what you mean by "start." The processor fetches the next instruction the next cycle. If there is a dependency that forwarding can't handle, then the processor will stall the necessary stages (stalling is shown in the example by stages shown in '()', such as (F)).

@wendyli6238 6 жыл бұрын

What I meant by start was where the next instruction would begin F,D,E... if we didnt use forwarding but needed information from the previous instruction. If we were not using forwarding and need information from a current register in the next instruction, we wouldnt decode the next instruction until after the current instruction finished its memory stage?

@matthewwatkins88 6 жыл бұрын

If there was no forwarding at all, the dependent instruction wouldn't truly start decode until the previous was in writeback (assuming writes to the register file appear to happen before reads, which is what is assumed in the video). Data is only written to the register file in writeback, so, without forwarding, wouldn't be available until then.

@wendyli6238 6 жыл бұрын

Thank you that is very helpful! :D

@mahanteshmise6930 5 жыл бұрын

instruction no3 and no4 there must be stall at decode for instr 4.Correct me if i am wrong

@boathecat919 6 жыл бұрын

For when neither branch taken, why does the last instruction "add $v0, $s0, S0" have no cycle?

@codewithven9391 2 ай бұрын

Because it's outside the loop. Only the ones inside the loop are considered for this problem. We are determining the overall CPI for the loop

@Manas09rai 3 жыл бұрын

Hey I just wanted to ask if an add instruction was dependent on a ld or lw instruction prior to it, would there be the same 2 cycle stall as there was for the beq instruction that was dependable on the lw instruction?

@yogeshbalbehra8930 7 жыл бұрын

what are stages in typical four stage cpu pipeline? and whats the purpose of each stage? this question was in my exam. can you help me with answee

@jayz6698 6 жыл бұрын

why is the iteration 14 is including the first W and does not include the last W (between 5 to17) ?

@matthewwatkins88 6 жыл бұрын

As is noted in the comment for the video, there is a slightly updated version of this video (kzbin.info/www/bejne/eJvCc42VmZWCobc). The CPI calculation shown is correct, but, as you note, the line at ~7:00 should extend to cycle 18, for a total of 14 cycles. (Also, the W in cycle 18 for the slt should really be an M.)

@martint5340 6 жыл бұрын

This is awesome. Thank you very much!

@eduardomiguelsalaspalacios3325 5 жыл бұрын

mi causa dice que te equivocaste, es cierto? que opinas?

@matthewwatkins88 5 жыл бұрын

I don't speak Spanish.

@CrashOverride332 2 жыл бұрын

This went way too fast for me. I kept having to rewind.

@zachnanabooboo517 6 жыл бұрын

Didn't know mike greenberg knew mips

@motorheadbanger90 6 жыл бұрын

There are a lot of things you are saying that contradict my teachings and readings on this matter. Can you please explain to me what you define as the following: 1) What is "branch taken/not taken" 2) What is forwarding Additionally, are you saying that the resource in t0 cannot be accessed by the subsequent instruction until the memory stage of the previous instruction? And we have forwarding in this problem? Assuming yes, then your understanding of forwarding, and my understanding of forwarding contradict. Can you help explain?

@mohammadrezabaqery7492 5 жыл бұрын

did you forget to resolve a dependency between add and lw? add $t0, $s3, $s4 lw $t0, 0($t0)

@dmm2708 4 жыл бұрын

there is is a dependency but it doesn t change the outcome

@pavuluriviratchowdary4480 4 жыл бұрын

Because t0 is already executed in first instruction so there is no need for the processor to run it second time.

@perionan7281 7 жыл бұрын

OH MY GOD!! THANK YOU

@a96185e 5 жыл бұрын

this is fantastic :)

@x3axDev 6 ай бұрын

thank you tony stark

@pacifiky 9 ай бұрын

This is so cool

@mehmetb8703 6 жыл бұрын

nice tutorial

@profitjourneywithsk2136 5 жыл бұрын

Good video

@sukrusekeroglu 4 жыл бұрын

ohh no offence but i am happy to hear non-indian accent, I said oh god thanks in the beginning of the video

@FelixTheForgotten 2 жыл бұрын

Sometimes I am so desperate I try to understand Vietnamese videos to study. Always feels good to find an English video even though it isn't my first language

@PEGuyMadison 4 ай бұрын

Oh come on throw those branch delays in and show how inefficient RISC code is.

@matthewwatkins88 4 ай бұрын

When you say "RISC" do you mean actual RISC? If so, actual RISC code is equavelent to what is shown. If you are refering to the Mips code, then yes, real Mips code would change the performance, but it doen't necesarily destroy it.