The Attention Mechanism in Large Language Models

  Рет қаралды 75,802

Serrano.Academy

Serrano.Academy

10 ай бұрын

Attention mechanisms are crucial to the huge boom LLMs have recently had.
In this video you'll see a friendly pictorial explanation of how attention mechanisms work in Large Language Models.
This is the first of a series of three videos on Transformer models.
Video 1: The attention mechanism in high level (this one)
Video 2: The attention mechanism with math: • The math behind Attent...
Video 3: Transformer models • What are Transformer M...
Learn more in LLM University! llm.university

Пікірлер: 153
@arvindkumarsoundarrajan9479
@arvindkumarsoundarrajan9479 4 ай бұрын
I have been reading the "attention is all you need" paper for like 2 years. Never understood it properly like this ever before😮. I'm so happy now🎉
@user-bw5np7zz5m
@user-bw5np7zz5m 13 күн бұрын
I love your clear, non-intimidating, and visual teaching style.
@SerranoAcademy
@SerranoAcademy 13 күн бұрын
Thank you so much for your kind words and your kind contribution! It’s really appreciated!
@malikkissoum730
@malikkissoum730 6 ай бұрын
Best teacher on the internet, thank you for your amazing work and the time you took to put those videos together
@RG-ik5kw
@RG-ik5kw 10 ай бұрын
Your videos in the LLM uni are incredible. Builds up true understanding after watching tons of other material that was all a bit loose on the ends. Thank you!
@EricMutta
@EricMutta 5 ай бұрын
Truly amazing video! The published papers never bother to explain things with this level of clarity and simplicity, which is a shame because if more people outside the field understood what is going on, we may have gotten something like ChatGPT about 10 years sooner! Thanks for taking the time to make this - the visual presentation with the little animations makes a HUGE difference!
@gunjanmimo
@gunjanmimo 9 ай бұрын
This is one of the best videos on KZbin to understand ATTENTION. Thank you for creating such outstanding content. I am waiting for upcoming videos of this series. Thank you ❤
@calum.macleod
@calum.macleod 10 ай бұрын
I appreciate your videos, especially how you can apply a good perspective to understand the high level concepts, before getting too deep into the maths.
@TheMircus224
@TheMircus224 5 ай бұрын
These videos where you explain the transformers are excellent. I have gone through a lot of material however, it is your videos that have allowed me to understand the intuition behind these models. Thank you very much!
@pruthvipatel8720
@pruthvipatel8720 9 ай бұрын
I always struggled with KQV in attention paper. Thanks a lot for this crystal clear explanation! Eagerly looking forward to the next videos on this topic.
@apah
@apah 9 ай бұрын
So glad to see you're still active Luis ! You and Statquest's Josh Stamer really are the backbone of more ml professionals than you can imagine
@nealdavar939
@nealdavar939 Ай бұрын
The way you break down these concepts is insane. Thank you
@JyuSub
@JyuSub 2 ай бұрын
Just THANK YOU. This is by far the best video on the attention mechanism for people that learn visually
@mohandesai
@mohandesai 10 ай бұрын
One of the best explainations of attention I have seen without getting lost in the forest of computations. Looking forward to future videoas
@SerranoAcademy
@SerranoAcademy 10 ай бұрын
Thank you so much!
@aadeshingle7593
@aadeshingle7593 8 ай бұрын
One of the best intuitions for understanding multi-head attention. Thanks a lot!❣
@ajnbin
@ajnbin 4 ай бұрын
Fantastic !!! The explanation itself is a piece of art. The step by step approach, the abstractions, ... Kudos!! Please more of these
@hyyue7549
@hyyue7549 4 ай бұрын
If I understand correctly, the transformer is basically a RNN model which got intercepted by bunch of different attention layers. Attention layers redo the embeddings every time when there is a new word coming in, the new embeddings are calculated based on current context and new word, then the embeddings will be sent to the feed forward layer and behave like the classic RNN model.
@bobae1357
@bobae1357 2 ай бұрын
best description ever! easy to understand. I've been suffered to understanding attention. Finally I can tell I know it!
@mohameddjilani4109
@mohameddjilani4109 7 ай бұрын
I really enjoyed how you give a clear explanation of the operations and the representations used in attention
@MikeTon
@MikeTon 3 ай бұрын
This clarifies EMBEDDED matrices : - In particular the point on how a book isn't just a RANDOM array of words, Matrices are NOT a RANDOM array of numbers - Visualization for the transform and shearing really drives home the V, Q, K aspect of the attention matrix that I have been STRUGGLING to internalize Big, big thanks for putting together this explanation!
@saeed577
@saeed577 3 ай бұрын
THE best explanation of this concept. That was genuinely amazing.
@drdr3496
@drdr3496 3 ай бұрын
This is a great video (as are the other 2) but one thing that needs to be clarified is that the embeddings themselves do not change (by attention @10:49). The gravity pull analogy is appropriate but the visuals give the impression that embedding weights change. What changes is the context vector.
@anipacify1163
@anipacify1163 2 ай бұрын
Omg this video is on a whole new level . This is prolly the best intuition behind the transformers and attention. Best way to understand. I went thro' a couple of videos online and finally found the best one . Thanks a lot ! Helped me understand the paper easily
@amoghjain
@amoghjain 5 ай бұрын
Thank you for making this video series for the sake of a learner and not to show off your own knowledge!! Great anecdotes and simple examples really helped me understand the key concepts!!
@sayamkumar7276
@sayamkumar7276 10 ай бұрын
This is one of the clearest, simplest and the most intuitive explanations on attention mechanism.. Thanks for making such a tedious and challenging concept of attention relatively easy to understand 👏 Looking forward to the impending 2 videos of this series on attention
@arulbalasubramanian9474
@arulbalasubramanian9474 6 ай бұрын
Great explanation. After watching a handful of videos this one really makes it real easy to understand.
@dr.mikeybee
@dr.mikeybee 10 ай бұрын
Nicely done! This gives a great explanation of the function and value of the projection matrices.
@ccgarciab
@ccgarciab 2 ай бұрын
This is such a good, clear and concise video. Great job!
@abu-yousuf
@abu-yousuf 6 ай бұрын
amazing explanation Luis. Can't thank you enough for your amazing work. You have a special gift to explain things. Thanks.
@kevon217
@kevon217 8 ай бұрын
Wow, clearest example yet. Thanks for making this!
@soumen_das
@soumen_das 8 ай бұрын
Hey Louis, you are AMAZING! Your explanations are incredible.
@docodemo727
@docodemo727 5 ай бұрын
this video is really teaching you the intuition. much better than the others I went through that just throw formula to you. thanks for the great job!
@karlbooklover
@karlbooklover 10 ай бұрын
best explanation of embeddings I've seen, thank you!
@JorgeMartinez-xb2ks
@JorgeMartinez-xb2ks 5 ай бұрын
El mejor video que he visto sobre la materia. Muchísimas gracias por este gran trabajo.
@dragolov
@dragolov 10 ай бұрын
Deep respect, Luis Serrano! Thank you so much!
@pranayroy
@pranayroy 3 ай бұрын
Kudos to your efforts in clear explanation!
@RamiroMoyano
@RamiroMoyano 8 ай бұрын
This is amazingly clear! Thank for your your work!
@orcunkoraliseri9214
@orcunkoraliseri9214 2 ай бұрын
I watched a lot about attentions. You are the best. Thank you thank you. I am also learning how to explain of a subject from you 😊
@hkwong74531
@hkwong74531 4 ай бұрын
I subscribe your channel immediately after watching this video, the first video I watch from your channel but also the first making me understand why embedding needs to be multiheaded. 👍🏻👍🏻👍🏻👍🏻
@justthefactsplease
@justthefactsplease 2 ай бұрын
What a great explanation on this topic! Great job!
@sari54754
@sari54754 5 ай бұрын
The most easy to understand video for the subject I've seen.
@user-dg2gt2yq3c
@user-dg2gt2yq3c 2 ай бұрын
It's so great, I finally understand these qkvs, it bothers me so long. Thank you so much !!!
@tvinay8758
@tvinay8758 9 ай бұрын
This is an great explanation of attention mechanism . I have enjoyed your maths for machine learning on coursera. Thank you for creating such wonderful videos
@kafaayari
@kafaayari 9 ай бұрын
Well the gravity example is how I understood this after a long time. you are true legend.
@davutumut1469
@davutumut1469 10 ай бұрын
amazing, love your channel. It's certainly underrated.
@alijohnnaqvi6383
@alijohnnaqvi6383 3 ай бұрын
What a great video man!!! Thanks for making such videos.
@VenkataraoKunchangi-uy4tg
@VenkataraoKunchangi-uy4tg 3 күн бұрын
Thanks for sharing. Your videos are helping me in my job. Thank you.
@agbeliemmanuel6023
@agbeliemmanuel6023 10 ай бұрын
Wooow thanks so much. You are a treasure to the world. Amazing teacher of our time.
@caryjason4171
@caryjason4171 Ай бұрын
This video helps to explain the concept in a simple way.
@orcunkoraliseri9214
@orcunkoraliseri9214 2 ай бұрын
Wooow. Such a good explanation for embedding. Thanks 🎉
@DeepakSharma-xg5nu
@DeepakSharma-xg5nu 2 ай бұрын
I did not even realize this video is 21 minutes long. Great explanation.
@satvikparamkusham7454
@satvikparamkusham7454 9 ай бұрын
This is the most amazing video on "Attention is all you need"
@eddydewaegeneer9514
@eddydewaegeneer9514 Ай бұрын
Great video and very intuitive explenation of attention mechanism
@prashant5611
@prashant5611 9 ай бұрын
Amazing! Loved it! Thanks a lot Serrano!
@aaalexlit
@aaalexlit 7 ай бұрын
That's an awesome explanation! Thanks!
@notprof
@notprof 8 ай бұрын
Thank you so much for making these videos!
@cyberpunkdarren
@cyberpunkdarren 2 ай бұрын
Very impressed with this channel and presenter
@bananamaker4877
@bananamaker4877 6 ай бұрын
Explained very well. Thank you so much.
@bankawat1
@bankawat1 8 ай бұрын
Thanks for the amazing videos! I am eagrly waiting for the third video. If possible please do explain the bit how the K,Q,V matrices are used on the decoder side. That would be great help.
@jeffpatrick787
@jeffpatrick787 4 ай бұрын
This was great - really well done!
@perpetuallearner8257
@perpetuallearner8257 9 ай бұрын
You're my fav teacher. Thank you Luis 😊
@epistemophilicmetalhead9454
@epistemophilicmetalhead9454 21 сағат бұрын
Word embeddings Vectorial representation of a word. The values in a word embedding describe various features of the words. Similar words' embeddings have a higher cosine similarity value. Attention The same word may mean different things in different contexts. How similar the word is to other words in that sentence will give you an idea as to what it really means. You start with an initial set of embeddings and take into account different words from the sentence and come up with new embeddings (trainable parameters) that better describe the word contextually. Similar/dissimilar words gravitate towards/away from each other as their updated embeddings show. Multi-head attention Take multiple possible transformations to potentially apply to the current embeddings and train a neural network to choose the best embeddings (contributions are scaled by how good the embeddings are)
@ignacioruiz3732
@ignacioruiz3732 2 ай бұрын
Outstanding video. Amazing to gain intuition.
@erickdamasceno
@erickdamasceno 10 ай бұрын
Great explanation. Thank you very much for sharing this.
@debarttasharan
@debarttasharan 9 ай бұрын
Incredible explanation. Thank you so much!!!
@LuisOtte-pk4wd
@LuisOtte-pk4wd 3 ай бұрын
Luis Serrano you have a gift for explain! Thank you for sharing!
@vishnusharma_7
@vishnusharma_7 9 ай бұрын
You are great at teaching Mr. Luis
@drintro
@drintro 3 ай бұрын
Excellent description.
@thelookerful
@thelookerful 8 ай бұрын
This is wonderful !!
@jayanthkothapalli9.2
@jayanthkothapalli9.2 2 ай бұрын
Wow wow wow! I enjoyed the video. Great teaching sir❤❤
@maysammansor
@maysammansor 2 ай бұрын
you are a great teacher. Thank you
@WhatsAI
@WhatsAI 10 ай бұрын
Amazing explanation Luis! As always...
@SerranoAcademy
@SerranoAcademy 10 ай бұрын
Merci Louis! :)
@bengoshi4
@bengoshi4 9 ай бұрын
Yeah!!!! Looking forward to the second one!! 👍🏻😎
@user-uq7kc2eb1i
@user-uq7kc2eb1i 4 ай бұрын
This video is really clear!
@SulkyRain
@SulkyRain 4 ай бұрын
Amazing explanation 🎉
@traveldiaries347
@traveldiaries347 6 ай бұрын
Very well explained ❤
@naimsassine
@naimsassine 5 ай бұрын
super good job guys!
@surajprasad8741
@surajprasad8741 5 ай бұрын
Thanks a lot Sir, clearly understood.
@khameelmustapha
@khameelmustapha 10 ай бұрын
Brilliant explanation.
@sukhpreetlotey1172
@sukhpreetlotey1172 2 ай бұрын
First of all thank you for making these great walkthroughs of the architecture. I would really like to support your effort on this channel. let me know how I can do that. thanks
@SerranoAcademy
@SerranoAcademy 2 ай бұрын
Thank you so much, I really appreciate that! Soon I'll be implementing subscriptions, so you can subscribe to the channel and contribute (also get some perks). Please stay tuned, I'll publish it here and also on social media. :)
@EigenA
@EigenA 4 ай бұрын
Great video!
@divikchoudhary8873
@divikchoudhary8873 8 күн бұрын
This is just Gold!!!!!
2 ай бұрын
My comment is just an array of letters for our algorithmic gods..Good stuff.
@serkansunel
@serkansunel 3 ай бұрын
Excellent job
@muhammadsaqlain3720
@muhammadsaqlain3720 6 ай бұрын
Thanks my friend.
@ProgrammerRajaa
@ProgrammerRajaa 9 ай бұрын
Your videos are so awesome plse upload more video thanks a lot
@ernesttan8090
@ernesttan8090 4 ай бұрын
wonderful!
@waelmashal7594
@waelmashal7594 5 күн бұрын
Great video
@deeplearningwithjay
@deeplearningwithjay 3 ай бұрын
You are amazing !
@preetijani9658
@preetijani9658 5 ай бұрын
Amazing
@shashankshekharsingh9336
@shashankshekharsingh9336 24 күн бұрын
thank you sir 🙏, love from india💌
@bravulo
@bravulo 5 ай бұрын
Thanks. I saw also your "Math behind" video, but still missing the third in the series.
@SerranoAcademy
@SerranoAcademy 5 ай бұрын
Thanks! The third video is out now! kzbin.info/www/bejne/p5K6foKPm5mln5o
@liminal6823
@liminal6823 10 ай бұрын
Fantastic.
@TemporaryForstudy
@TemporaryForstudy 8 ай бұрын
oh my god never understood V,K,Q as matrix transformations, thanks luis, love from india
@samirelzein1095
@samirelzein1095 10 ай бұрын
The great Luis!
@angminhquan1491
@angminhquan1491 Ай бұрын
love the video
@RGUKTEDUIN
@RGUKTEDUIN 4 күн бұрын
King of math
@s.chandrasekhar8290
@s.chandrasekhar8290 7 ай бұрын
¡Gracias!
@SerranoAcademy
@SerranoAcademy 6 ай бұрын
Muchisimas gracias por tu colaboración!!! Que amable!
@today-radio-in-the-zone
@today-radio-in-the-zone 22 күн бұрын
Thanks for your great effort to make people understand it. I, however, would like ask one thing such that you have explained V is the scores. scores of what? My opninion is that the V is the key vector so that the V makes QKT matrix to vector space again. Please make it clear for better understanding. Thanks!
@mostinho7
@mostinho7 5 ай бұрын
7:00 even with word embedding, words can be missing context and there’s no way to tell like the word apple. Are you taking about the company or the fruit? Attention matches each word of the input with every other word, in order to transform it or pull it towards a different location in the embedding based on the context. So when the sentence is “buy apple and orange” the word orange will cause the word apple to have an embedding or vector representation that’s closer to the fruit 8:00
@BigAsciiHappyStar
@BigAsciiHappyStar 21 күн бұрын
13:32 "feel free to pause the video" reminds me of Chess KZbinr agadmator 🤣
The math behind Attention: Keys, Queries, and Values matrices
36:16
Serrano.Academy
Рет қаралды 197 М.
What are Transformer Models and how do they work?
44:26
Serrano.Academy
Рет қаралды 94 М.
Joven bailarín noquea a ladrón de un golpe #nmas #shorts
00:17
Who Will Eat The Porridge First The Cockroach Or Me? 👧vs🪳
00:26
Giggle Jiggle
Рет қаралды 11 МЛН
顔面水槽がブサイク過ぎるwwwww
00:58
はじめしゃちょー(hajime)
Рет қаралды 112 МЛН
Latent Dirichlet Allocation (Part 1 of 2)
26:57
Serrano.Academy
Рет қаралды 126 М.
ChatGPT Can Now Talk Like a Human [Latest Updates]
22:21
ColdFusion
Рет қаралды 433 М.
Einstein’s Other Theory of Everything
13:20
Sabine Hossenfelder
Рет қаралды 278 М.
A friendly introduction to Deep Learning and Neural Networks
33:20
Serrano.Academy
Рет қаралды 693 М.
Large Language Models from scratch
8:25
Graphics in 5 Minutes
Рет қаралды 333 М.
Self-Attention Using Scaled Dot-Product Approach
16:09
Machine Learning Studio
Рет қаралды 12 М.
Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!
36:15
StatQuest with Josh Starmer
Рет қаралды 559 М.
A Friendly Introduction to Generative Adversarial Networks (GANs)
21:01
Serrano.Academy
Рет қаралды 239 М.
The Beta distribution in 12 minutes!
13:31
Serrano.Academy
Рет қаралды 76 М.
Carregando telefone com carregador cortado
1:01
Andcarli
Рет қаралды 1,2 МЛН
Apple. 10 Интересных Фактов
24:26
Dameoz
Рет қаралды 111 М.
Best Gun Stock for VR gaming. #vr #vrgaming  #glistco
0:15
Glistco
Рет қаралды 10 МЛН