This is the best explanation I've heard about Self Attention....perfectly explained👍
@cokicoki-wv7vpАй бұрын
I watched a lot of Attention Mechanisms lectures form youtube, but neither of them give me clear point of what it is. your vedio give very clear point what it is. thanks very much, you are a good teacher.
@khanupdates80225 ай бұрын
A really simple and clear explanation. I had been looking for such a a lecture for many days. Thanks
@Ndheti2 ай бұрын
1:50 to clarify the query weights are the same for all the words, the key weights are the same for all the words and value weights are the same for all the words. He is not meaning the query weights=key weights =value weights are the same. Great video and I look forward to doing his course. The way I think of self attention is, you take an embedding matrix and add context to it. That's it, simple. By adding context, it's similar to adding descriptions to a word. So you can have a plain vanilla word "bank" and it means a million things of bank....e.g. bank of America, bank in town x, bank painted in blue, bank that was robbed last year, bank of england etc but once you have self attention you generate a new embedding that gives context to a bank to mean e.g. "a bank that is painted red outside and located in street X of Barcelona." That's the end goal of self attention, keep focus on that.
@acaudio75454 ай бұрын
Fantastic explanation - very good overview. Revisiting this from last year and this helped me visualise exactly how it worked. Great job
@sitaositao4 ай бұрын
Thanks a lot for a very clear explanation. This is what's needed in ML course videos.
@for-ever-222 ай бұрын
Exceptional video detailing this concept
@abderahimmazouz2088Ай бұрын
best explantion for self-attention
@NicklasHolmberg3 ай бұрын
Great explanation, thanks a lot!
@SeifMohamed-d6w4 ай бұрын
Clear explanation!
@Yudios6 ай бұрын
Thanks for clear explanation. very helpfull.
@19AKS582 ай бұрын
Great video. What comprises the QUERY vector? Exactly how is it different from the initial Embedding vector for that same word? I'd ask the same question about the Key & Value vectors.
@machinelearningcourses4764Ай бұрын
In transformers, specifically in the attention mechanism, the Query, Key, and Value vectors are derived from the Embedding vector for each word but serve distinct roles: Embedding Vector: This is the initial representation of a word, capturing its meaning based on its context in training. It’s like a starting "meaning" for each word in the model's vocabulary. Query, Key, and Value Vectors: These are created by multiplying the embedding vector by different learned weight matrices. So, each word’s embedding vector gets transformed into a Query vector (Q), a Key vector (K), and a Value vector (V). Query (Q): Determines how much focus this word should have on others. It essentially “asks” how much attention should go to other words. Key (K): Represents the features of each word that other words might attend to. It "answers" the query by letting the model determine which words should get attention. Value (V): Contains the actual information from the word’s embedding that will be used in producing the output of the attention layer. The Value vector is combined based on the attention scores (derived from Queries and Keys). In short, while the embedding is a static representation of meaning, Q, K, and V vectors dynamically adjust meaning to enable context-specific attention.
@maksim36632 ай бұрын
i'm pretty sure it is a good lecture, but 1 minute to explain that every word has three vectors, and then 2 minutes explaining literally the same thing "we multiply query by every word's key vector" - it just makes the video of the lecture (over)flooded with water. Sane timestamps (or breaking the video into chapters) would solve this I'm sure