L19.4.1 Using Attention Without the RNN -- A Basic Form of Self-Attention

  Рет қаралды 13,453

Sebastian Raschka

Sebastian Raschka

Күн бұрын

Пікірлер: 27
@mohammadyahya78
@mohammadyahya78 Жыл бұрын
I think this is the best so far to explain self-attention and most basic.
@geekyprogrammer4831
@geekyprogrammer4831 Жыл бұрын
your videos are super underrated. You deserve a lot more views!!
@SebastianRaschka
@SebastianRaschka Жыл бұрын
Wow thanks for the compliment. Maybe that's because I don't do any SEO, haha
@vladimir_egay
@vladimir_egay 2 жыл бұрын
Best intro to self-attention I have seen so far! Thank you a lot!
@SebastianRaschka
@SebastianRaschka 2 жыл бұрын
wow, thanks! glad to hear it was clear!
@vatsalpatel6330
@vatsalpatel6330 Жыл бұрын
agreed
@jameson1697
@jameson1697 2 жыл бұрын
Phenomenal explanation. Thank you for your devotion to open and free education!
@angrest1
@angrest1 2 жыл бұрын
Thank you very much for these videos. They make complicated things seem much simpler and much more fun. And you do a great job explaining the intuition behind these sometimes quite confusing topics. So thanks again it's a massive help!
@SebastianRaschka
@SebastianRaschka 2 жыл бұрын
Thanks so much for saying this, I am glad to hear!
@gluteusminimus2134
@gluteusminimus2134 Жыл бұрын
Love your videos. You are really good at breaking down things at clear steps. Most other videos on youtube either do not make any sense or are not explaining things on a deep enough level.
@nithinma8697
@nithinma8697 2 ай бұрын
00:03 Introducing self attention and transformer networks. 02:05 Introduction to RNNs with an Attention Mechanism 04:08 Attention Mechanism is a foundational concept in transformer architecture. 06:07 Introduction to self attention mechanism in transformers 08:04 RNNs with Attention Mechanism use weighted sum to compute attention value 10:32 RNNs with Attention Mechanism involve computing normalized attention weights using softmax function. 12:24 RNNs with attention use dot product to compute similarity. 14:29 Word embeddings in RNNs provide consistent values regardless of word position. Crafted by Merlin AI.
@algorithmo134
@algorithmo134 5 ай бұрын
how do we create the word embedding? Also what is x_i in 12:38?
@nobywils
@nobywils 2 жыл бұрын
Amazing and simple explanation!
@SebastianRaschka
@SebastianRaschka 2 жыл бұрын
Thanks!!
@albertoandreotti7940
@albertoandreotti7940 2 жыл бұрын
only 902 views? this is a great resource!
@SebastianRaschka
@SebastianRaschka 2 жыл бұрын
Hah, I take this as a compliment! Thanks!
@abubakarali6399
@abubakarali6399 2 жыл бұрын
Why geometric deep learning is not included in the playlist?
@SebastianRaschka
@SebastianRaschka 2 жыл бұрын
The semester is only so long ... But my new book (coming out next month) will have a chapter on graph neural nets!
@thiagopx1
@thiagopx1 2 жыл бұрын
It doesn’t make sense for me dot(xi,xj). It seems I am comparing the similarity between words instead of comparing key and query. Could explain better , please?
@SebastianRaschka
@SebastianRaschka 2 жыл бұрын
This is a good point. Essentially, it is same thing as computing the similarity between the query and a key in its simple form without parameters. Instead of dot(x_i, x_j) the key-query computation would be dot(q_i, k_j), but the query itself is computed as q_i = Q x_i, and the key is computed as k_j = K x_j. So, if you don't use weight matrices Q and K, this would be the same similarity between words.
@davidlearnforus
@davidlearnforus 2 жыл бұрын
great lesson. I am just starting to learn dl and I may ask something silly, but this self-attention looks to me the same as what we have in graph nn-s
@SebastianRaschka
@SebastianRaschka 2 жыл бұрын
yeah I think it's somewhat related. Btw there are actually graph attention networks as well 😅 arxiv.org/abs/1710.10903
@davidlearnforus
@davidlearnforus 2 жыл бұрын
@@SebastianRaschka many thanks for answer and paper
@736939
@736939 2 жыл бұрын
What is the intuition to multiply the dot products between the word embeddings? So as I understood not all the embedding methods can be used to get the attention by the dot products. Only those, that represent the importance of terms at the beginning (TF-IDF, BM25 can't be used then).
@friedrichwilhelmhufnagel3577
@friedrichwilhelmhufnagel3577 Жыл бұрын
Nennt man hier nicht ein dot product attention score und die ganze summe attention vector?
@WahranRai
@WahranRai 2 жыл бұрын
you underline / highlight too much almost every word / sentence: your slides are overloaded and unreadable
@SebastianRaschka
@SebastianRaschka 2 жыл бұрын
Oh oh. I think I got better recently when I switched from an iPad to a pen tablet -- the iPad makes the annotation just too easy lol
L19.4.2 Self-Attention and Scaled Dot-Product Attention
16:09
Sebastian Raschka
Рет қаралды 21 М.
L19.5.1 The Transformer Architecture
22:36
Sebastian Raschka
Рет қаралды 18 М.
When Cucumbers Meet PVC Pipe The Results Are Wild! 🤭
00:44
Crafty Buddy
Рет қаралды 33 МЛН
Happy birthday to you by Secret Vlog
00:12
Secret Vlog
Рет қаралды 6 МЛН
Мама у нас строгая
00:20
VAVAN
Рет қаралды 2,8 МЛН
Walking on LEGO Be Like... #shorts #mingweirocks
00:41
mingweirocks
Рет қаралды 7 МЛН
L19.3 RNNs with an Attention Mechanism
22:19
Sebastian Raschka
Рет қаралды 19 М.
Attention is all you need || Transformers Explained || Quick Explained
11:55
L19.1 Sequence Generation with Word and Character RNNs
17:44
Sebastian Raschka
Рет қаралды 8 М.
Attention for RNN Seq2Seq Models (1.25x speed recommended)
24:51
Shusen Wang
Рет қаралды 31 М.
Transformer Model (1/2): Attention Layers
32:59
Shusen Wang
Рет қаралды 27 М.
Self-Attention Using Scaled Dot-Product Approach
16:09
Machine Learning Studio
Рет қаралды 16 М.
CS480/680 Lecture 19: Attention and Transformer Networks
1:22:38
Pascal Poupart
Рет қаралды 350 М.
Lecture 12.1 Self-attention
22:30
DLVU
Рет қаралды 71 М.
Attention in transformers, visually explained | Chapter 6, Deep Learning
26:10
When Cucumbers Meet PVC Pipe The Results Are Wild! 🤭
00:44
Crafty Buddy
Рет қаралды 33 МЛН