L19.4.1 Using Attention Without the RNN -- A Basic Form of Self-Attention

Рет қаралды 13,453

Sebastian Raschka

Күн бұрын

Пікірлер: 27

@mohammadyahya78 Жыл бұрын

I think this is the best so far to explain self-attention and most basic.

@geekyprogrammer4831 Жыл бұрын

your videos are super underrated. You deserve a lot more views!!

@SebastianRaschka Жыл бұрын

Wow thanks for the compliment. Maybe that's because I don't do any SEO, haha

@vladimir_egay 2 жыл бұрын

Best intro to self-attention I have seen so far! Thank you a lot!

@SebastianRaschka 2 жыл бұрын

wow, thanks! glad to hear it was clear!

@vatsalpatel6330 Жыл бұрын

agreed

@jameson1697 2 жыл бұрын

Phenomenal explanation. Thank you for your devotion to open and free education!

@angrest1 2 жыл бұрын

Thank you very much for these videos. They make complicated things seem much simpler and much more fun. And you do a great job explaining the intuition behind these sometimes quite confusing topics. So thanks again it's a massive help!

@SebastianRaschka 2 жыл бұрын

Thanks so much for saying this, I am glad to hear!

@gluteusminimus2134 Жыл бұрын

Love your videos. You are really good at breaking down things at clear steps. Most other videos on youtube either do not make any sense or are not explaining things on a deep enough level.

@nithinma8697 2 ай бұрын

00:03 Introducing self attention and transformer networks. 02:05 Introduction to RNNs with an Attention Mechanism 04:08 Attention Mechanism is a foundational concept in transformer architecture. 06:07 Introduction to self attention mechanism in transformers 08:04 RNNs with Attention Mechanism use weighted sum to compute attention value 10:32 RNNs with Attention Mechanism involve computing normalized attention weights using softmax function. 12:24 RNNs with attention use dot product to compute similarity. 14:29 Word embeddings in RNNs provide consistent values regardless of word position. Crafted by Merlin AI.

@algorithmo134 5 ай бұрын

how do we create the word embedding? Also what is x_i in 12:38?

@nobywils 2 жыл бұрын

Amazing and simple explanation!

@SebastianRaschka 2 жыл бұрын

Thanks!!

@albertoandreotti7940 2 жыл бұрын

only 902 views? this is a great resource!

@SebastianRaschka 2 жыл бұрын

Hah, I take this as a compliment! Thanks!

@abubakarali6399 2 жыл бұрын

Why geometric deep learning is not included in the playlist?

@SebastianRaschka 2 жыл бұрын

The semester is only so long ... But my new book (coming out next month) will have a chapter on graph neural nets!

@thiagopx1 2 жыл бұрын

It doesn’t make sense for me dot(xi,xj). It seems I am comparing the similarity between words instead of comparing key and query. Could explain better , please?

@SebastianRaschka 2 жыл бұрын

This is a good point. Essentially, it is same thing as computing the similarity between the query and a key in its simple form without parameters. Instead of dot(x_i, x_j) the key-query computation would be dot(q_i, k_j), but the query itself is computed as q_i = Q x_i, and the key is computed as k_j = K x_j. So, if you don't use weight matrices Q and K, this would be the same similarity between words.

@davidlearnforus 2 жыл бұрын

great lesson. I am just starting to learn dl and I may ask something silly, but this self-attention looks to me the same as what we have in graph nn-s

@SebastianRaschka 2 жыл бұрын

yeah I think it's somewhat related. Btw there are actually graph attention networks as well 😅 arxiv.org/abs/1710.10903

@davidlearnforus 2 жыл бұрын

@@SebastianRaschka many thanks for answer and paper

@736939 2 жыл бұрын

What is the intuition to multiply the dot products between the word embeddings? So as I understood not all the embedding methods can be used to get the attention by the dot products. Only those, that represent the importance of terms at the beginning (TF-IDF, BM25 can't be used then).