Transformers - Part 7 - Decoder (2): masked self-attention

Рет қаралды 20,366

Lennart Svensson

Күн бұрын

Пікірлер: 28

@i-fanlin568 2 жыл бұрын

This masked self-attention is so clear to me! Thank you for your sharing!

@DmitryPesegov Жыл бұрын

Need an example with BATCH being fed into that. What would be the rows in batches? What would Y looks like? Only then it is possible to really see how masks works.

@cedricmanouan1615 Жыл бұрын

The first sentence of the video solved my problem 😅 "what enables us to parallelize calculation during training"

@farrugiamarc0 8 ай бұрын

A very clear and amazingly detailed explanation of such a complex topic. It would be nice to have more videos related to ML from you!

@mir7tahmid 2 жыл бұрын

best explanation! Thank you Mr. Svensson.

@hemanthsai369 Жыл бұрын

Best video on masking!

@abrahamowos 2 жыл бұрын

At 7:14 ,I was the notations would be sm(Z_11, Z_12) and sm(Z_21, Z_22) for the second column...... This that correct?

@stevenhoang7297 2 жыл бұрын

Thank you for the video! Just to be clear, the entire target input is passed into the decoder? In the slide starting at 1:33, it looks like the last token is omitted.

@lennartsvensson7636 2 жыл бұрын

The end of sequence is not passed into the decoder (since there is nothing left for us to predict/translate once we have obtained an EOS token). Is that token part of the target sentence? I guess that is a matter of taste/perspectives.

@amirnasser7768 Жыл бұрын

Thank you for the nice explanation. I think you missed to mention that in order to get zeros masking with the softmax you need to set the values (upper triangle of the matrix) to negative infinity.

@zgx8181 2 жыл бұрын

Thx for sharing. It's a good teaching video for newbie like me.

@manishagarwal5323 Жыл бұрын

Hi Professor, are there lectures of courses, or weblinks to your what you teach? Love your clear, precise and well paced coverage of the concepts here! many thanks.

@lennartsvensson7636 Жыл бұрын

Thanks for your kind words. I have an online course in multi-object tracking (on KZbin and edX) but it is model-based instead of learning-based. Hopefully, I will soon find time to post more ML material.

@dirtyharry7280 Жыл бұрын

Excellent, thank you!

@rushikeshnaik1502 3 жыл бұрын

Thanks for explaining it in details.

@lennartsvensson7636 3 жыл бұрын

Glad you liked it!

@xiangzhou314 3 жыл бұрын

Thanks! That really helped

@asmersoy4111 2 жыл бұрын

Very helpful. Thank you!

@zimingzhang6336 2 жыл бұрын

very clear interruption but i want to know only the first decoder has mask or all decoders have mask？and is there has mask only in train or both train and predict stage？ thanks a lot！

@zimingzhang6336 2 жыл бұрын

I seem to have figured it out，the answer is：all decoders have mask and both two stage have mask，is it right？

@zgx8181 2 жыл бұрын

@@zimingzhang6336 只有训练时才有 mask，或者说训练的时候 mask 才有用，至于预测的时候有没有 mask 我就不知道了，要看具体实现。因为训练时为了加速而并行计算，并行计算时就会把完整的翻译结果输入进去，而实际预测的时候是不可能在预测出完整的结果前就有完整的结果，所以训练的时候用 mask 遮住当前那些如果是预测时不可能已经获取的部分。（话说在 KZbin 用英文问题可以尽量把语法写标准些，看着舒服些 😂）

@zimingzhang6336 2 жыл бұрын

@@zgx8181 预测的时候还是会有mask，mask的作用不仅仅是遮住后面的labels，而且保证了前面每一个decoder输入序列只依赖与他前面的序列，要是预测去掉mask，每一次decoder的输出会不同

@benoitmialet9842 8 ай бұрын

Masked self attention should ONLY come into play during training, as the decoding input is a sequence containing the answer (the future tokens). But we see almost everywhere that is aslo occurs during inference. So how does masked self attention come into play during a generation process (once the model has been trained), since the as-yet-ungenerated tokens simply don't exist? Thanks for any clarification!

@aquienleimporta9096 2 жыл бұрын

how decoder match the size of output of the encoder with his input in every step to could make the multiplication of the matrixes

@kacemichakdi3048 3 жыл бұрын

Thank u for your explanation. I just dont understand in the output embedding(input of the decoder ) what we do ??? (do we already know the translation ???)

@lennartsvensson7636 3 жыл бұрын

Did you watch part 5? Your question seems related to how we use the network during testing and training.

@zgx8181 2 жыл бұрын

Yes we already know the translation during training.