This masked self-attention is so clear to me! Thank you for your sharing!
@DmitryPesegov Жыл бұрын
Need an example with BATCH being fed into that. What would be the rows in batches? What would Y looks like? Only then it is possible to really see how masks works.
@cedricmanouan1615 Жыл бұрын
The first sentence of the video solved my problem 😅 "what enables us to parallelize calculation during training"
@farrugiamarc08 ай бұрын
A very clear and amazingly detailed explanation of such a complex topic. It would be nice to have more videos related to ML from you!
@mir7tahmid2 жыл бұрын
best explanation! Thank you Mr. Svensson.
@hemanthsai369 Жыл бұрын
Best video on masking!
@abrahamowos2 жыл бұрын
At 7:14 ,I was the notations would be sm(Z_11, Z_12) and sm(Z_21, Z_22) for the second column...... This that correct?
@stevenhoang72972 жыл бұрын
Thank you for the video! Just to be clear, the entire target input is passed into the decoder? In the slide starting at 1:33, it looks like the last token is omitted.
@lennartsvensson76362 жыл бұрын
The end of sequence is not passed into the decoder (since there is nothing left for us to predict/translate once we have obtained an EOS token). Is that token part of the target sentence? I guess that is a matter of taste/perspectives.
@amirnasser7768 Жыл бұрын
Thank you for the nice explanation. I think you missed to mention that in order to get zeros masking with the softmax you need to set the values (upper triangle of the matrix) to negative infinity.
@zgx81812 жыл бұрын
Thx for sharing. It's a good teaching video for newbie like me.
@manishagarwal5323 Жыл бұрын
Hi Professor, are there lectures of courses, or weblinks to your what you teach? Love your clear, precise and well paced coverage of the concepts here! many thanks.
@lennartsvensson7636 Жыл бұрын
Thanks for your kind words. I have an online course in multi-object tracking (on KZbin and edX) but it is model-based instead of learning-based. Hopefully, I will soon find time to post more ML material.
@dirtyharry7280 Жыл бұрын
Excellent, thank you!
@rushikeshnaik15023 жыл бұрын
Thanks for explaining it in details.
@lennartsvensson76363 жыл бұрын
Glad you liked it!
@xiangzhou3143 жыл бұрын
Thanks! That really helped
@asmersoy41112 жыл бұрын
Very helpful. Thank you!
@zimingzhang63362 жыл бұрын
very clear interruption but i want to know only the first decoder has mask or all decoders have mask?and is there has mask only in train or both train and predict stage? thanks a lot!
@zimingzhang63362 жыл бұрын
I seem to have figured it out,the answer is:all decoders have mask and both two stage have mask,is it right?
Masked self attention should ONLY come into play during training, as the decoding input is a sequence containing the answer (the future tokens). But we see almost everywhere that is aslo occurs during inference. So how does masked self attention come into play during a generation process (once the model has been trained), since the as-yet-ungenerated tokens simply don't exist? Thanks for any clarification!
@aquienleimporta90962 жыл бұрын
how decoder match the size of output of the encoder with his input in every step to could make the multiplication of the matrixes
@kacemichakdi30483 жыл бұрын
Thank u for your explanation. I just dont understand in the output embedding(input of the decoder ) what we do ??? (do we already know the translation ???)
@lennartsvensson76363 жыл бұрын
Did you watch part 5? Your question seems related to how we use the network during testing and training.
@zgx81812 жыл бұрын
Yes we already know the translation during training.