Query, Key and Value Matrix for Attention Mechanisms in Large Language Models

  Рет қаралды 11,098

Machine Learning Courses

Machine Learning Courses

Күн бұрын

Пікірлер: 14
@aluna73
@aluna73 2 ай бұрын
This is the best explanation I've heard about Self Attention....perfectly explained👍
@cokicoki-wv7vp
@cokicoki-wv7vp Ай бұрын
I watched a lot of Attention Mechanisms lectures form youtube, but neither of them give me clear point of what it is. your vedio give very clear point what it is. thanks very much, you are a good teacher.
@khanupdates8022
@khanupdates8022 5 ай бұрын
A really simple and clear explanation. I had been looking for such a a lecture for many days. Thanks
@Ndheti
@Ndheti 2 ай бұрын
1:50 to clarify the query weights are the same for all the words, the key weights are the same for all the words and value weights are the same for all the words. He is not meaning the query weights=key weights =value weights are the same. Great video and I look forward to doing his course. The way I think of self attention is, you take an embedding matrix and add context to it. That's it, simple. By adding context, it's similar to adding descriptions to a word. So you can have a plain vanilla word "bank" and it means a million things of bank....e.g. bank of America, bank in town x, bank painted in blue, bank that was robbed last year, bank of england etc but once you have self attention you generate a new embedding that gives context to a bank to mean e.g. "a bank that is painted red outside and located in street X of Barcelona." That's the end goal of self attention, keep focus on that.
@acaudio7545
@acaudio7545 4 ай бұрын
Fantastic explanation - very good overview. Revisiting this from last year and this helped me visualise exactly how it worked. Great job
@sitaositao
@sitaositao 4 ай бұрын
Thanks a lot for a very clear explanation. This is what's needed in ML course videos.
@for-ever-22
@for-ever-22 2 ай бұрын
Exceptional video detailing this concept
@abderahimmazouz2088
@abderahimmazouz2088 Ай бұрын
best explantion for self-attention
@NicklasHolmberg
@NicklasHolmberg 3 ай бұрын
Great explanation, thanks a lot!
@SeifMohamed-d6w
@SeifMohamed-d6w 4 ай бұрын
Clear explanation!
@Yudios
@Yudios 6 ай бұрын
Thanks for clear explanation. very helpfull.
@19AKS58
@19AKS58 2 ай бұрын
Great video. What comprises the QUERY vector? Exactly how is it different from the initial Embedding vector for that same word? I'd ask the same question about the Key & Value vectors.
@machinelearningcourses4764
@machinelearningcourses4764 Ай бұрын
In transformers, specifically in the attention mechanism, the Query, Key, and Value vectors are derived from the Embedding vector for each word but serve distinct roles: Embedding Vector: This is the initial representation of a word, capturing its meaning based on its context in training. It’s like a starting "meaning" for each word in the model's vocabulary. Query, Key, and Value Vectors: These are created by multiplying the embedding vector by different learned weight matrices. So, each word’s embedding vector gets transformed into a Query vector (Q), a Key vector (K), and a Value vector (V). Query (Q): Determines how much focus this word should have on others. It essentially “asks” how much attention should go to other words. Key (K): Represents the features of each word that other words might attend to. It "answers" the query by letting the model determine which words should get attention. Value (V): Contains the actual information from the word’s embedding that will be used in producing the output of the attention layer. The Value vector is combined based on the attention scores (derived from Queries and Keys). In short, while the embedding is a static representation of meaning, Q, K, and V vectors dynamically adjust meaning to enable context-specific attention.
@maksim3663
@maksim3663 2 ай бұрын
i'm pretty sure it is a good lecture, but 1 minute to explain that every word has three vectors, and then 2 minutes explaining literally the same thing "we multiply query by every word's key vector" - it just makes the video of the lecture (over)flooded with water. Sane timestamps (or breaking the video into chapters) would solve this I'm sure
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
The KV Cache: Memory Usage in Transformers
8:33
Efficient NLP
Рет қаралды 47 М.
Une nouvelle voiture pour Noël 🥹
00:28
Nicocapone
Рет қаралды 9 МЛН
Сестра обхитрила!
00:17
Victoria Portfolio
Рет қаралды 958 М.
LCM: The Ultimate Evolution of AI? Large Concept Models
30:13
Discover AI
Рет қаралды 52 М.
The math behind Attention: Keys, Queries, and Values matrices
36:16
Serrano.Academy
Рет қаралды 272 М.
Key Query Value Attention Explained
10:13
Alex-AI
Рет қаралды 21 М.
Attention in transformers, visually explained | DL6
26:10
3Blue1Brown
Рет қаралды 2 МЛН
A Visual Guide to Mixture of Experts (MoE) in LLMs
19:44
Maarten Grootendorst
Рет қаралды 3,1 М.
Local GraphRAG with LLaMa 3.1 - LangChain, Ollama & Neo4j
15:01
Coding Crash Courses
Рет қаралды 35 М.
How positional encoding works in transformers?
5:36
BrainDrain
Рет қаралды 14 М.
The Attention Mechanism in Large Language Models
21:02
Serrano.Academy
Рет қаралды 106 М.
Une nouvelle voiture pour Noël 🥹
00:28
Nicocapone
Рет қаралды 9 МЛН