I think this is the best so far to explain self-attention and most basic.
@geekyprogrammer4831 Жыл бұрын
your videos are super underrated. You deserve a lot more views!!
@SebastianRaschka Жыл бұрын
Wow thanks for the compliment. Maybe that's because I don't do any SEO, haha
@vladimir_egay2 жыл бұрын
Best intro to self-attention I have seen so far! Thank you a lot!
@SebastianRaschka2 жыл бұрын
wow, thanks! glad to hear it was clear!
@vatsalpatel6330 Жыл бұрын
agreed
@jameson16972 жыл бұрын
Phenomenal explanation. Thank you for your devotion to open and free education!
@angrest12 жыл бұрын
Thank you very much for these videos. They make complicated things seem much simpler and much more fun. And you do a great job explaining the intuition behind these sometimes quite confusing topics. So thanks again it's a massive help!
@SebastianRaschka2 жыл бұрын
Thanks so much for saying this, I am glad to hear!
@gluteusminimus2134 Жыл бұрын
Love your videos. You are really good at breaking down things at clear steps. Most other videos on youtube either do not make any sense or are not explaining things on a deep enough level.
@nithinma86972 ай бұрын
00:03 Introducing self attention and transformer networks. 02:05 Introduction to RNNs with an Attention Mechanism 04:08 Attention Mechanism is a foundational concept in transformer architecture. 06:07 Introduction to self attention mechanism in transformers 08:04 RNNs with Attention Mechanism use weighted sum to compute attention value 10:32 RNNs with Attention Mechanism involve computing normalized attention weights using softmax function. 12:24 RNNs with attention use dot product to compute similarity. 14:29 Word embeddings in RNNs provide consistent values regardless of word position. Crafted by Merlin AI.
@algorithmo1345 ай бұрын
how do we create the word embedding? Also what is x_i in 12:38?
@nobywils2 жыл бұрын
Amazing and simple explanation!
@SebastianRaschka2 жыл бұрын
Thanks!!
@albertoandreotti79402 жыл бұрын
only 902 views? this is a great resource!
@SebastianRaschka2 жыл бұрын
Hah, I take this as a compliment! Thanks!
@abubakarali63992 жыл бұрын
Why geometric deep learning is not included in the playlist?
@SebastianRaschka2 жыл бұрын
The semester is only so long ... But my new book (coming out next month) will have a chapter on graph neural nets!
@thiagopx12 жыл бұрын
It doesn’t make sense for me dot(xi,xj). It seems I am comparing the similarity between words instead of comparing key and query. Could explain better , please?
@SebastianRaschka2 жыл бұрын
This is a good point. Essentially, it is same thing as computing the similarity between the query and a key in its simple form without parameters. Instead of dot(x_i, x_j) the key-query computation would be dot(q_i, k_j), but the query itself is computed as q_i = Q x_i, and the key is computed as k_j = K x_j. So, if you don't use weight matrices Q and K, this would be the same similarity between words.
@davidlearnforus2 жыл бұрын
great lesson. I am just starting to learn dl and I may ask something silly, but this self-attention looks to me the same as what we have in graph nn-s
@SebastianRaschka2 жыл бұрын
yeah I think it's somewhat related. Btw there are actually graph attention networks as well 😅 arxiv.org/abs/1710.10903
@davidlearnforus2 жыл бұрын
@@SebastianRaschka many thanks for answer and paper
@7369392 жыл бұрын
What is the intuition to multiply the dot products between the word embeddings? So as I understood not all the embedding methods can be used to get the attention by the dot products. Only those, that represent the importance of terms at the beginning (TF-IDF, BM25 can't be used then).
@friedrichwilhelmhufnagel3577 Жыл бұрын
Nennt man hier nicht ein dot product attention score und die ganze summe attention vector?
@WahranRai2 жыл бұрын
you underline / highlight too much almost every word / sentence: your slides are overloaded and unreadable
@SebastianRaschka2 жыл бұрын
Oh oh. I think I got better recently when I switched from an iPad to a pen tablet -- the iPad makes the annotation just too easy lol