The Attention Mechanism in Large Language Models

  Рет қаралды 103,475

Serrano.Academy

Serrano.Academy

Күн бұрын

Пікірлер: 189
@arvindkumarsoundarrajan9479
@arvindkumarsoundarrajan9479 10 ай бұрын
I have been reading the "attention is all you need" paper for like 2 years. Never understood it properly like this ever before😮. I'm so happy now🎉
@drdr3496
@drdr3496 9 ай бұрын
This is a great video (as are the other 2) but one thing that needs to be clarified is that the embeddings themselves do not change (by attention @10:49). The gravity pull analogy is appropriate but the visuals give the impression that embedding weights change. What changes is the context vector.
@ronitakhariya4094
@ronitakhariya4094 9 күн бұрын
absolutely loved the last part with explaining linear transformations of query key and values. thank you so much!
@RG-ik5kw
@RG-ik5kw Жыл бұрын
Your videos in the LLM uni are incredible. Builds up true understanding after watching tons of other material that was all a bit loose on the ends. Thank you!
@Compsci-v6q
@Compsci-v6q 2 ай бұрын
This channel is uderrated, your explainations is the best among other channels
@Aidin-f5v
@Aidin-f5v 20 күн бұрын
That was awesome, Thank you. You saved me a lot of time reading and watching none-sense videos and texts .
@malikkissoum730
@malikkissoum730 Жыл бұрын
Best teacher on the internet, thank you for your amazing work and the time you took to put those videos together
@gunjanmimo
@gunjanmimo Жыл бұрын
This is one of the best videos on KZbin to understand ATTENTION. Thank you for creating such outstanding content. I am waiting for upcoming videos of this series. Thank you ❤
@GrahamAnderson-z7x
@GrahamAnderson-z7x 6 ай бұрын
I love your clear, non-intimidating, and visual teaching style.
@SerranoAcademy
@SerranoAcademy 6 ай бұрын
Thank you so much for your kind words and your kind contribution! It’s really appreciated!
@saeed577
@saeed577 9 ай бұрын
THE best explanation of this concept. That was genuinely amazing.
@JyuSub
@JyuSub 8 ай бұрын
Just THANK YOU. This is by far the best video on the attention mechanism for people that learn visually
@FawadMahdi-o2h
@FawadMahdi-o2h 2 ай бұрын
This was hands down the best explanation I've seen of attention mechanisms and multi head attention --- the fact I'm able to use these words in this sentence means I understand it
@EricMutta
@EricMutta Жыл бұрын
Truly amazing video! The published papers never bother to explain things with this level of clarity and simplicity, which is a shame because if more people outside the field understood what is going on, we may have gotten something like ChatGPT about 10 years sooner! Thanks for taking the time to make this - the visual presentation with the little animations makes a HUGE difference!
@decryptifi2265
@decryptifi2265 19 күн бұрын
What a beautiful way of explaining "Attention Mechanism". Great job Serano
@bobae1357
@bobae1357 9 ай бұрын
best description ever! easy to understand. I've been suffered to understanding attention. Finally I can tell I know it!
@aadeshingle7593
@aadeshingle7593 Жыл бұрын
One of the best intuitions for understanding multi-head attention. Thanks a lot!❣
@anipacify1163
@anipacify1163 9 ай бұрын
Omg this video is on a whole new level . This is prolly the best intuition behind the transformers and attention. Best way to understand. I went thro' a couple of videos online and finally found the best one . Thanks a lot ! Helped me understand the paper easily
@mohameddjilani4109
@mohameddjilani4109 Жыл бұрын
I really enjoyed how you give a clear explanation of the operations and the representations used in attention
@amoghjain
@amoghjain 11 ай бұрын
Thank you for making this video series for the sake of a learner and not to show off your own knowledge!! Great anecdotes and simple examples really helped me understand the key concepts!!
@TheMircus224
@TheMircus224 11 ай бұрын
These videos where you explain the transformers are excellent. I have gone through a lot of material however, it is your videos that have allowed me to understand the intuition behind these models. Thank you very much!
@sari54754
@sari54754 Жыл бұрын
The most easy to understand video for the subject I've seen.
@ccgarciab
@ccgarciab 8 ай бұрын
This is such a good, clear and concise video. Great job!
@nealdavar939
@nealdavar939 7 ай бұрын
The way you break down these concepts is insane. Thank you
@apah
@apah Жыл бұрын
So glad to see you're still active Luis ! You and Statquest's Josh Stamer really are the backbone of more ml professionals than you can imagine
@calum.macleod
@calum.macleod Жыл бұрын
I appreciate your videos, especially how you can apply a good perspective to understand the high level concepts, before getting too deep into the maths.
@ajnbin
@ajnbin 11 ай бұрын
Fantastic !!! The explanation itself is a piece of art. The step by step approach, the abstractions, ... Kudos!! Please more of these
@pruthvipatel8720
@pruthvipatel8720 Жыл бұрын
I always struggled with KQV in attention paper. Thanks a lot for this crystal clear explanation! Eagerly looking forward to the next videos on this topic.
@guru7856
@guru7856 2 ай бұрын
Thank you for your explanation! I've always wondered why the attention mechanism in Transformers produces more effective embeddings compared to Word2Vec, and your video clarified this well. Word2Vec generates static embeddings, meaning that a word always has the same representation, regardless of the context in which it appears. In contrast, Transformers create context-dependent embeddings, where the representation of a word is influenced by the words around it. This dynamic approach is what makes Transformer embeddings so powerful.
@sayamkumar7276
@sayamkumar7276 Жыл бұрын
This is one of the clearest, simplest and the most intuitive explanations on attention mechanism.. Thanks for making such a tedious and challenging concept of attention relatively easy to understand 👏 Looking forward to the impending 2 videos of this series on attention
@kevon217
@kevon217 Жыл бұрын
Wow, clearest example yet. Thanks for making this!
@hyyue7549
@hyyue7549 11 ай бұрын
If I understand correctly, the transformer is basically a RNN model which got intercepted by bunch of different attention layers. Attention layers redo the embeddings every time when there is a new word coming in, the new embeddings are calculated based on current context and new word, then the embeddings will be sent to the feed forward layer and behave like the classic RNN model.
@lohithArcot
@lohithArcot 3 ай бұрын
Can anyone confirm this?
@MikeTon
@MikeTon 10 ай бұрын
This clarifies EMBEDDED matrices : - In particular the point on how a book isn't just a RANDOM array of words, Matrices are NOT a RANDOM array of numbers - Visualization for the transform and shearing really drives home the V, Q, K aspect of the attention matrix that I have been STRUGGLING to internalize Big, big thanks for putting together this explanation!
@pranayroy
@pranayroy 9 ай бұрын
Kudos to your efforts in clear explanation!
@arulbalasubramanian9474
@arulbalasubramanian9474 Жыл бұрын
Great explanation. After watching a handful of videos this one really makes it real easy to understand.
@agbeliemmanuel6023
@agbeliemmanuel6023 Жыл бұрын
Wooow thanks so much. You are a treasure to the world. Amazing teacher of our time.
@docodemo727
@docodemo727 Жыл бұрын
this video is really teaching you the intuition. much better than the others I went through that just throw formula to you. thanks for the great job!
@karlbooklover
@karlbooklover Жыл бұрын
best explanation of embeddings I've seen, thank you!
@abu-yousuf
@abu-yousuf Жыл бұрын
amazing explanation Luis. Can't thank you enough for your amazing work. You have a special gift to explain things. Thanks.
@iliasp4275
@iliasp4275 6 ай бұрын
Excellent video. Best explanation on the internet !
@dragolov
@dragolov Жыл бұрын
Deep respect, Luis Serrano! Thank you so much!
@soumen_das
@soumen_das Жыл бұрын
Hey Louis, you are AMAZING! Your explanations are incredible.
@rikiakbar4025
@rikiakbar4025 4 ай бұрын
Thanks Luis, been following your contents for a while. This video about attention mechanism is very intuitive and easy to follow
@muhammetibrahimkaraman7471
@muhammetibrahimkaraman7471 Ай бұрын
I've really enjoyed with that way of you described and demonstrated matrices as linear transformations. Thank you! Why, because I like Linear Algebra 😄
@satvikparamkusham7454
@satvikparamkusham7454 Жыл бұрын
This is the most amazing video on "Attention is all you need"
@kafaayari
@kafaayari Жыл бұрын
Well the gravity example is how I understood this after a long time. you are true legend.
@DeepakSharma-xg5nu
@DeepakSharma-xg5nu 8 ай бұрын
I did not even realize this video is 21 minutes long. Great explanation.
@epistemophilicmetalhead9454
@epistemophilicmetalhead9454 6 ай бұрын
Word embeddings Vectorial representation of a word. The values in a word embedding describe various features of the words. Similar words' embeddings have a higher cosine similarity value. Attention The same word may mean different things in different contexts. How similar the word is to other words in that sentence will give you an idea as to what it really means. You start with an initial set of embeddings and take into account different words from the sentence and come up with new embeddings (trainable parameters) that better describe the word contextually. Similar/dissimilar words gravitate towards/away from each other as their updated embeddings show. Multi-head attention Take multiple possible transformations to potentially apply to the current embeddings and train a neural network to choose the best embeddings (contributions are scaled by how good the embeddings are)
@homakashefiamiri3749
@homakashefiamiri3749 2 ай бұрын
It was the most useful video explaining attention mechanism. Thank you
@tanggenius3371
@tanggenius3371 5 ай бұрын
Thanks, the explaination is so intuitive. Finally understood the idea of attention.
@perpetuallearner8257
@perpetuallearner8257 Жыл бұрын
You're my fav teacher. Thank you Luis 😊
@caryjason4171
@caryjason4171 8 ай бұрын
This video helps to explain the concept in a simple way.
@yairbh
@yairbh 4 ай бұрын
Great explanation with the linear transformation matrices. Thanks!
@dr.mikeybee
@dr.mikeybee Жыл бұрын
Nicely done! This gives a great explanation of the function and value of the projection matrices.
@justthefactsplease
@justthefactsplease 8 ай бұрын
What a great explanation on this topic! Great job!
@LuisOtte-pk4wd
@LuisOtte-pk4wd 10 ай бұрын
Luis Serrano you have a gift for explain! Thank you for sharing!
@eddydewaegeneer9514
@eddydewaegeneer9514 7 ай бұрын
Great video and very intuitive explenation of attention mechanism
@JorgeMartinez-xb2ks
@JorgeMartinez-xb2ks Жыл бұрын
El mejor video que he visto sobre la materia. Muchísimas gracias por este gran trabajo.
@hkwong74531
@hkwong74531 10 ай бұрын
I subscribe your channel immediately after watching this video, the first video I watch from your channel but also the first making me understand why embedding needs to be multiheaded. 👍🏻👍🏻👍🏻👍🏻
@mostinho7
@mostinho7 Жыл бұрын
7:00 even with word embedding, words can be missing context and there’s no way to tell like the word apple. Are you taking about the company or the fruit? Attention matches each word of the input with every other word, in order to transform it or pull it towards a different location in the embedding based on the context. So when the sentence is “buy apple and orange” the word orange will cause the word apple to have an embedding or vector representation that’s closer to the fruit 8:00
@唐伟祚-j4v
@唐伟祚-j4v 8 ай бұрын
It's so great, I finally understand these qkvs, it bothers me so long. Thank you so much !!!
@mayyutyagi
@mayyutyagi 5 ай бұрын
Amazing video... Thanks sir for this pictorial representation and explaining this complex topic with such an easy way.
@davutumut1469
@davutumut1469 Жыл бұрын
amazing, love your channel. It's certainly underrated.
@RamiroMoyano
@RamiroMoyano Жыл бұрын
This is amazingly clear! Thank for your your work!
@orcunkoraliseri9214
@orcunkoraliseri9214 8 ай бұрын
I watched a lot about attentions. You are the best. Thank you thank you. I am also learning how to explain of a subject from you 😊
@cyberpunkdarren
@cyberpunkdarren 9 ай бұрын
Very impressed with this channel and presenter
@WhatsAI
@WhatsAI Жыл бұрын
Amazing explanation Luis! As always...
@SerranoAcademy
@SerranoAcademy Жыл бұрын
Merci Louis! :)
@ThinkGrowIndia
@ThinkGrowIndia Жыл бұрын
Amazing! Loved it! Thanks a lot Serrano!
@neelkamal3357
@neelkamal3357 2 ай бұрын
I didn't get it on why do we add linear transformation like earlier too we had embeddings in other planes then why do shear transformation ? Please someone answer
@VenkataraoKunchangi-uy4tg
@VenkataraoKunchangi-uy4tg 6 ай бұрын
Thanks for sharing. Your videos are helping me in my job. Thank you.
@bananamaker4877
@bananamaker4877 Жыл бұрын
Explained very well. Thank you so much.
@vishnusharma_7
@vishnusharma_7 Жыл бұрын
You are great at teaching Mr. Luis
@arshmaanali714
@arshmaanali714 3 ай бұрын
Superb explanation❤ please make more videos like this
@erickdamasceno
@erickdamasceno Жыл бұрын
Great explanation. Thank you very much for sharing this.
@Omsip123
@Omsip123 5 ай бұрын
Outstanding, thank you for this pearl of knowledge!
@jayanthAILab
@jayanthAILab 8 ай бұрын
Wow wow wow! I enjoyed the video. Great teaching sir❤❤
@tvinay8758
@tvinay8758 Жыл бұрын
This is an great explanation of attention mechanism . I have enjoyed your maths for machine learning on coursera. Thank you for creating such wonderful videos
@Cdictator
@Cdictator 4 ай бұрын
This is amazing explanation! Thank you so much 🎉
@sathyanukala3409
@sathyanukala3409 9 ай бұрын
Excellent explanation. Thank you very much.
@bravulo
@bravulo Жыл бұрын
Thanks. I saw also your "Math behind" video, but still missing the third in the series.
@SerranoAcademy
@SerranoAcademy 11 ай бұрын
Thanks! The third video is out now! kzbin.info/www/bejne/p5K6foKPm5mln5o
@BhuvanDwarasila-y8x
@BhuvanDwarasila-y8x 2 ай бұрын
Thank you so much for the attention to the topic!
@SerranoAcademy
@SerranoAcademy 2 ай бұрын
Thanks! Lol, I see what you did there! :D
@debarttasharan
@debarttasharan Жыл бұрын
Incredible explanation. Thank you so much!!!
@sukhpreetlotey1172
@sukhpreetlotey1172 8 ай бұрын
First of all thank you for making these great walkthroughs of the architecture. I would really like to support your effort on this channel. let me know how I can do that. thanks
@SerranoAcademy
@SerranoAcademy 8 ай бұрын
Thank you so much, I really appreciate that! Soon I'll be implementing subscriptions, so you can subscribe to the channel and contribute (also get some perks). Please stay tuned, I'll publish it here and also on social media. :)
@alijohnnaqvi6383
@alijohnnaqvi6383 9 ай бұрын
What a great video man!!! Thanks for making such videos.
@orcunkoraliseri9214
@orcunkoraliseri9214 8 ай бұрын
Wooow. Such a good explanation for embedding. Thanks 🎉
@ignacioruiz3732
@ignacioruiz3732 8 ай бұрын
Outstanding video. Amazing to gain intuition.
@TemporaryForstudy
@TemporaryForstudy Жыл бұрын
oh my god never understood V,K,Q as matrix transformations, thanks luis, love from india
9 ай бұрын
My comment is just an array of letters for our algorithmic gods..Good stuff.
@tantzer6113
@tantzer6113 Жыл бұрын
Paraphrase: we weigh each embedding by its score, and then add up all these weighted embeddings to obtain a really good embedding. Question to think about: why not just take the best embedding? Is it because averaging improves robustness to noise?
@SerranoAcademy
@SerranoAcademy Жыл бұрын
That is a great question! Yes, one thing is because of robustness. Also, each embedding may capture different things, one could be good for a certain topic (say, fruits) but terrible at others (say, technology). Another reason is because of continuity. Let's say that you have embedding A, which has the highest score. The moment embedding B gets a higher score, you would switch abruptly from A to B, which creates a jump discontinuity. If you take the average, instead, you would smoothly go from, say 0.51*A + 0.49*B, into 0.49^A + 0.51*B, which is very similar.
@tantzer6113
@tantzer6113 Жыл бұрын
Thanks for the answer, and for the wonderful video.
@tantzer6113
@tantzer6113 Жыл бұрын
Maybe the next video will clarify how the weighting is achieved. At first I thought the V matrix provides the weighting of the different embeddings, but now I am not sure.
@SerranoAcademy
@SerranoAcademy Жыл бұрын
@@tantzer6113 yes! I thought the exact same thing, but then someone showed me they it doesn’t, those weights are recorded inside the transformer. I’m seeing that the V matrix is another embedding in which the transformation is made (and the K and Q are used to find the distances). But I’ll clarify this more in the next video.
@today-radio-in-the-zone
@today-radio-in-the-zone 7 ай бұрын
Thanks for your great effort to make people understand it. I, however, would like ask one thing such that you have explained V is the scores. scores of what? My opninion is that the V is the key vector so that the V makes QKT matrix to vector space again. Please make it clear for better understanding. Thanks!
@benhargreaves5556
@benhargreaves5556 10 ай бұрын
Unless I'm mistaken, I think the linear transformations in this video incorrectly show the 2D axis as well as the object changing position, but in fact the 2D axis would stay exactly the same but with the 2D object rotating around it for example.
@SulkyRain
@SulkyRain 11 ай бұрын
Amazing explanation 🎉
@bankawat1
@bankawat1 Жыл бұрын
Thanks for the amazing videos! I am eagrly waiting for the third video. If possible please do explain the bit how the K,Q,V matrices are used on the decoder side. That would be great help.
@maysammansor
@maysammansor 9 ай бұрын
you are a great teacher. Thank you
@bagigiaasdtw43
@bagigiaasdtw43 Ай бұрын
I recognize your face! I took your Coursera course! Thank you so much!
@HoussamBIADI
@HoussamBIADI 5 ай бұрын
Thank you for this amazing explanation
@bengoshi4
@bengoshi4 Жыл бұрын
Yeah!!!! Looking forward to the second one!! 👍🏻😎
@SergeyGrebenkin
@SergeyGrebenkin 7 ай бұрын
At last someone explained the meaning of Q, K and V. I read original article and it just says "Ok, let's have 3 additional matrix Q, K and V to transform input embedding" ... What for? Thanks for explanation, this video really helps!
@ProgrammerRajaa
@ProgrammerRajaa Жыл бұрын
Your videos are so awesome plse upload more video thanks a lot
@bbarbny
@bbarbny 6 ай бұрын
Amazing video, thank you very much for sharing!
@drintro
@drintro 10 ай бұрын
Excellent description.
The math behind Attention: Keys, Queries, and Values matrices
36:16
Serrano.Academy
Рет қаралды 263 М.
What are Transformer Models and how do they work?
44:26
Serrano.Academy
Рет қаралды 129 М.
How Many Balloons To Make A Store Fly?
00:22
MrBeast
Рет қаралды 163 МЛН
Farmer narrowly escapes tiger attack
00:20
CTV News
Рет қаралды 13 МЛН
How many people are in the changing room? #devil #lilith #funny #shorts
00:39
Long Nails 💅🏻 #shorts
00:50
Mr DegrEE
Рет қаралды 18 МЛН
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
State Space Models (SSMs) and Mamba
26:06
Serrano.Academy
Рет қаралды 7 М.
RAG vs. Fine Tuning
8:57
IBM Technology
Рет қаралды 67 М.
Large Language Models explained briefly
8:48
3Blue1Brown
Рет қаралды 644 М.
Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention
15:25
Attention in transformers, visually explained | DL6
26:10
3Blue1Brown
Рет қаралды 1,8 МЛН
How AI 'Understands' Images (CLIP) - Computerphile
18:05
Computerphile
Рет қаралды 215 М.
ChatGPT: 30 Year History | How AI Learned to Talk
26:55
Art of the Problem
Рет қаралды 1,1 МЛН
Attention Is All You Need
27:07
Yannic Kilcher
Рет қаралды 654 М.
How Many Balloons To Make A Store Fly?
00:22
MrBeast
Рет қаралды 163 МЛН