Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

  Рет қаралды 310,843

Umar Jamil

Umar Jamil

Күн бұрын

A complete explanation of all the layers of a Transformer Model: Multi-Head Self-Attention, Positional Encoding, including all the matrix multiplications and a complete description of the training and inference process.
Paper: Attention is all you need - arxiv.org/abs/1706.03762
Slides PDF: github.com/hkproj/transformer...
Chapters
00:00 - Intro
01:10 - RNN and their problems
08:04 - Transformer Model
09:02 - Maths background and notations
12:20 - Encoder (overview)
12:31 - Input Embeddings
15:04 - Positional Encoding
20:08 - Single Head Self-Attention
28:30 - Multi-Head Attention
35:39 - Query, Key, Value
37:55 - Layer Normalization
40:13 - Decoder (overview)
42:24 - Masked Multi-Head Attention
44:59 - Training
52:09 - Inference

Пікірлер: 563
@umarjamilai
@umarjamilai Жыл бұрын
Slides' PDF: github.com/hkproj/transformer-from-scratch-notes
@bhaskartripathi
@bhaskartripathi 5 ай бұрын
I am not able to download the pdf file. My friends also tried. Will it be possible to put it on a downloadable link please? your content is too good and needs to be read again and again.
@mahek6110
@mahek6110 4 ай бұрын
its getting downloaded@@bhaskartripathi
@hackie321
@hackie321 18 күн бұрын
The best Transformer explanation on internet till now and I have seen almost all of it. Kudos! You are a true teacher. I dare to compare you with Andrew NG. Please become a professor and not a corporate slave.
@gabrielnsionu8583
@gabrielnsionu8583 6 ай бұрын
This is arguably the best explaination of the multi-head attention in the internet hands down. Very thorough and most important to folks like me using attention mechanism as my underpinning mechanism in developing my novel neural architecture to be applied to my deep reinforcement learning architecture. Sir, pls never stop making this type of videos.
@umarjamilai
@umarjamilai 6 ай бұрын
You're welcome! 🤓
@csikel22
@csikel22 6 ай бұрын
I couldn't agree more. Best video on transformers I have seen so far. I doesn't get clearer than this. It would be very interesting to give some insight why this whole thing works and what are other variations and alternative architectures.
@rkbshiva
@rkbshiva 6 ай бұрын
​@@umarjamilaibro you're a legend!!!!
@pablofe123
@pablofe123 Ай бұрын
There are still a couple of things that are not explained well in the video. Q, K and V matrixs are the same matrix? and where do the parameters matrix Wq, Wk and Wv comes from? Besides that, excellent video.
@peregudovoleg
@peregudovoleg Ай бұрын
@@pablofe123 21:25 "QKV are the same matreces". As for W matrices, he only says that they are "parameter matrices", and parameters is something we train during training process.
@JulianHarris
@JulianHarris 16 күн бұрын
I'm so glad I found this again. Do NOT rely on KZbin watch history it doesn't look at all your history. This is definitely the best explanation of transformers and attention and believe me I've watched quite a few! Kudos again Umar.
@umarjamilai
@umarjamilai 16 күн бұрын
You should subscribe to the channel to never lose it 😇 thanks for the kind words.
@kerrykilian9127
@kerrykilian9127 22 күн бұрын
best explanation of the paper on the whole internet
@orevjoker8332
@orevjoker8332 4 күн бұрын
I hardly ever comment on youtube videos, but wow this was a very well done video!
@DembaDiop-om3gv
@DembaDiop-om3gv 5 ай бұрын
The best explanation of "Attention is all you need" from my point of view, guys "This explanation is all you need". Thank you very much
@Udayanverma
@Udayanverma 7 ай бұрын
I would understand much deeper with your explanation. Rest of the world is scarying with diagrams and tables without explaining practical implementation. thank you dear!
@keviny2
@keviny2 9 күн бұрын
Thanks Umar for the amazing video. This is the most comprehensive yet understandable walkthrough of the transformer architecture that I came across. Super helpful. I feel like I have a good foundation for tackling more complex LLMs because of it.
@sanskargupta7085
@sanskargupta7085 4 күн бұрын
I feel lucky enough to have come across this channel, amazing stuff!
@tariqkhan1518
@tariqkhan1518 Ай бұрын
TBH The best Explanation of Attention in whole Internet.
@KunalTiwariBCI
@KunalTiwariBCI 25 күн бұрын
Bro, legit the best explanation I have ever seen so far.
@utkarshashinde9167
@utkarshashinde9167 Ай бұрын
I cannot tell you how grateful I am for this explanation provided by you .............. nowhere I find this detailed and easy-to-understand description, a go-to video for every interview preparing students
@vitoroliveiradesouza4214
@vitoroliveiradesouza4214 15 күн бұрын
I'm really glad to have found your video! Congratulations on the clean and yet detailed explanation
@ajithshenoy5566
@ajithshenoy5566 6 ай бұрын
Bless you Umar One of the finest tutorials out there. Please don't ever stop. We're willing to support you in every way possible.
@abc-by1kb
@abc-by1kb 10 ай бұрын
Such a great video! Explained all the key concepts so clearly and precisely while giving very nice intuition!
@shashankreddyboyapally4069
@shashankreddyboyapally4069 Ай бұрын
The queries keys and values are not divided by just seperating them but they are made into the size of the head by multiplying with a weight matrix which is learnable parameter, it is also applicable to the self attention algorithm
@sushantpenshanwar8038
@sushantpenshanwar8038 7 ай бұрын
You did the best job of describing the complicated details in a fluid manner. Sat, watched and took notes in one sitting. Hands down best one so far.
@silasnginyo7744
@silasnginyo7744 6 ай бұрын
So far the best laid out presentation of Transformers I have ever walked through
@Patrick-wn6uj
@Patrick-wn6uj 2 ай бұрын
This is the most important channel I have come across on youtube. keep creating these long form videos you are saving our lives in a huge away
@brunogatti383
@brunogatti383 Ай бұрын
Best video for attention mechanism hands down
@vrvlbl
@vrvlbl 4 ай бұрын
Amazing explanation. I struggled too long to understand the architecture until I landed on your video. Way to go!!
@abhilashbalachandran7160
@abhilashbalachandran7160 8 ай бұрын
super useful. I really loved how you explain this with linear algebra. Very insightful. actually easier to understand than a lot of lectures at universities
@peregudovoleg
@peregudovoleg Ай бұрын
38:56 normalizing in this case through layer norm, doesn't squish our values between [0, 1], but rather transforms them to have mean=0 and std=1. I know, a bit confusing, some papers mix normalization/standartization. For [0.1] the formula is different: x = (x - x_min) / (x_max - x_min). Just asked GPT, it said - "it is a DS thing to call standardization a normalization." Great video non the less. I try to rewatch it every now and then just because it is so good and helps to visualize everything.
@Nereus22
@Nereus22 5 ай бұрын
This is really a great video, exactly what I was searching for! Everything that you mentionned was explained in details (others are skipping a lot).
@SagarVibhute
@SagarVibhute 6 ай бұрын
Kudos on the commendable work, and simplified explanation! I appreciate that you are also trying to explain the intuition behind each step and not just math. I'll view and re-view this a few times to understand more with successive passes. Thank you!
@cristinaballesteros93
@cristinaballesteros93 3 ай бұрын
I have watched a lot of videos about transformers, and this is by far the best one. I finally understand how they work. Thank you so much!
@ddstar
@ddstar 4 ай бұрын
Excellent. You answered a lot of questions I had about where the weights come from and how they were updated
@AvinashKumar-pb2op
@AvinashKumar-pb2op Ай бұрын
Best Explanation Ever Existed in the whole Universe !!
@_seeker423
@_seeker423 3 ай бұрын
The clearest explanation of a very important breakthrough paper that I have seen on KZbin. Thank you!
@_seeker423
@_seeker423 2 ай бұрын
One thing that I felt was missing is the logical explanation of what is the role of value vector (V).
@keithchua1723
@keithchua1723 2 ай бұрын
Spent days trying to understand this and I wished I had come across this video first because now I understand everything fully. Immediately subscribed, keep it up!!
@mculabs
@mculabs 4 ай бұрын
Probably the best explanation of the paper and the encoder and decoder sub layers. Kudos!!
@ishaanjoshi6959
@ishaanjoshi6959 5 ай бұрын
The best explanation of attention based mechanism I found online , thank you so much Umar for making this video.
@NazerkeSafina
@NazerkeSafina 2 ай бұрын
This is brilliant. Thank you Umar for your hard work. Please keep new videos coming. You are helping immensely. May you live long and happy and healthy
@jdbrinton
@jdbrinton 6 ай бұрын
the clearest description I've found to-date. bravo!
@AIVidya
@AIVidya 6 ай бұрын
One of the best transforrmers videos encountered so far.
@AbhinavSharma-dc3kv
@AbhinavSharma-dc3kv Ай бұрын
the best explanation for attention architecture. kudos to you sir!
@megatroneata9911
@megatroneata9911 4 ай бұрын
After watching this video and the stable diffusion video, I can say forsure that you are an amazing teacher. Extremely digestible content and easy to follow along.
@albert4392
@albert4392 10 ай бұрын
I really appreciate your talent to present knowledge. Nice explaination, thank you so much!
@lethnis9307
@lethnis9307 Ай бұрын
Finally, after a lot of articles and videos i found a video a could understand. Thank you, sir. I am not strong in math but i think i understood a lot with this explanation
@sergewilsonmendy9051
@sergewilsonmendy9051 11 ай бұрын
Thank you man, this is the best transformer video I've seen. Well explained and very detailed.
@hamzaomari7052
@hamzaomari7052 2 ай бұрын
This is the best explanation, it took me 4 hours, to take notes and revise stuff, and going with you word by word, with intuitions, and now I feel that I truly understand the transformer architecture and the mathematical intuition behind every detail. A thing that you cannot find in any other video. Thank you so much sir, this is very instructif and helpful.
@70152136
@70152136 5 ай бұрын
your presentation skill are simply amazing!!! best video on transformers I've seen so far
@zeeshanmehdi3994
@zeeshanmehdi3994 3 ай бұрын
can't thank you enough, this is the best explanation of transformers i could find after trying for days to understand it. Thank you ❤
@tgyawali
@tgyawali 6 ай бұрын
Thank you, so much for putting together such a detailed video. This helps technical people who do not have a lot of experience in research but have some background in machine learning to understand this very important and historic paper in AI.
@NJCLM
@NJCLM 4 ай бұрын
This video is surely among the top 3 among the 50 videos that I watched to understand this subject. We are very grateful to you, keep the energy, KZbin numbers will follow !
@marsupilami125
@marsupilami125 2 ай бұрын
Can you tell me the other 2?🙏
@tipu461
@tipu461 10 ай бұрын
I really appreciate your efforts to make it understandable for us 👍. Thanks a lot.
@saravanannatarajan6515
@saravanannatarajan6515 3 ай бұрын
One of the best videos I have seen on this topic. Thanks a lot for making it easy for us. Great effort, hats off!
@Kingissingh08
@Kingissingh08 8 күн бұрын
Best video on transformer
@gauravmalik3911
@gauravmalik3911 3 ай бұрын
Detailed explanation, did great work on explaining difficult topic by dividing in chunks, I don't think any part is missed in explanation. Best Explanation
@haoming3430
@haoming3430 2 ай бұрын
Your video is very helpful and easy to follow. I have to say this is the best tutorial about transformer I've seen.
@ameyadesai6382
@ameyadesai6382 7 ай бұрын
The best explanation on this paper, can't wait to see the other videos on this topic.
@dalilabdouraman3557
@dalilabdouraman3557 5 ай бұрын
Definetely the best explanation of the mutli head attention with the transformer ...just awesome
@juwanyirenda3457
@juwanyirenda3457 6 ай бұрын
Excellent exposition! Thank you Umar for the great work.
@debjyotimukherjee8275
@debjyotimukherjee8275 2 ай бұрын
Excellent video gave a complete description with a great explanation. Looking forward to more such amazing content!
@channel8048
@channel8048 10 ай бұрын
This is very clear! Better than anything I have read up till now. Grazie!
@profyao
@profyao Ай бұрын
Absolutely the best explanation for multi-head attention so far!
@anirudhjoshi1607
@anirudhjoshi1607 6 ай бұрын
This is the clearest explanation on this paper I have ever heard. Always had doubts about Multi-Head attention and now finally I can visualise this 100%. Thanks a lot Umar Jamil.
@1tahirrauf
@1tahirrauf 9 ай бұрын
Umar! You nailed it. Please make more videos. It was truly helpful. Thank you.
@lyte69
@lyte69 7 ай бұрын
Thank you for your great explanation and effort, this was very informative and honestly there are no problems with the video, it's only a preference for me if there was some code alongside each part explained so it's even better understood, but I want you to know that this was a huge help thank you again. ❤
@ankitkacchap
@ankitkacchap Ай бұрын
Awesome explanation , our professor also doesn't explain like you did thank youTube recommendation and special thanks to u
@JohnSmith-he5xg
@JohnSmith-he5xg 7 ай бұрын
The best overview I've seen. Great job!
@subinaypanda9936
@subinaypanda9936 12 күн бұрын
Your explanation just hits my mind. You explained all the points, where I was facing problems to understand. It's like you can read my mind from past huh 😜. Yes subscribed.
@madhuvamsi7055
@madhuvamsi7055 7 ай бұрын
You've definitely earned a lifelong subscriber bro! Great video.
@rkjellbe
@rkjellbe 7 ай бұрын
Thank you, Umar. This was very helpful and I feel I have a much better understanding of the process now. Great work!
@ltbd78
@ltbd78 2 ай бұрын
You are incredible. Please continue making these type of tutorials.
@aeigreen
@aeigreen 9 ай бұрын
great explanation. thank you for demistify trasformer. I have come to your explantion after watching countless videos on transformer, your explanation is simply the best.
@shuchenwu170
@shuchenwu170 2 ай бұрын
This tutorial translates complex and terse structures into intuitions. A masterpiece of tutorials!
@saima6759
@saima6759 2 ай бұрын
transformer model never got so clear to me! thank you Umar!
@atrijpaul4009
@atrijpaul4009 5 ай бұрын
Best explanation of Attention throughout KZbin!!!!! Thank you sir for making this video and helping us..
@ActualCode0
@ActualCode0 6 ай бұрын
I like how u used examples and drew out the matrices to show what was going on in the attention block. It rly helped me understand the concept better
@shawkontzu642
@shawkontzu642 6 ай бұрын
Super on the explaining the differences between training and inferencing, that clears my confusion also in "time step = 1"
@umarjamilai
@umarjamilai 6 ай бұрын
That was the biggest source of confusion for myself as well. Glad it helped.
@calewang3713
@calewang3713 8 ай бұрын
Oh Man, you deserve a Turing Award.....
@sujeethav9885
@sujeethav9885 Ай бұрын
This is just perfect! A wholesome video on Transformers!
@priyanjaligoel4294
@priyanjaligoel4294 4 ай бұрын
omg! I love it. Finally so many answers to my questions. I had a very abstract version of the process in my head before but now its much clearer. Thank you so much!
@danielvillalba4457
@danielvillalba4457 5 ай бұрын
Lots of new insights about transformers technology, every document and video provides more details, great video sir!
@yuk-hoiyiu7023
@yuk-hoiyiu7023 4 ай бұрын
The only video that explains the difference between training and inference in the Transformer model!
@andreicristea997
@andreicristea997 7 ай бұрын
Finally the fancy "black box" called transformer became more understandable for me. Really interested in the other content you are making. Thanks for the explanation.
@skc909887u
@skc909887u 8 ай бұрын
This is the best explanation for an engineer for sure .love this
@rajkrishnamurthy8474
@rajkrishnamurthy8474 8 ай бұрын
Love it Umar. This is the best explanation of the paper. Thank you very much.
@BritskNguyen
@BritskNguyen 3 ай бұрын
this is the best lecture on transformer one can get, period.
@hugopristauz538
@hugopristauz538 8 ай бұрын
good job - your single stepping (with remarking) is really helpful
@nadyaabdel5559
@nadyaabdel5559 4 ай бұрын
Amazing explanation. First time every bit is super clear. Thank you.
@sudzam
@sudzam Ай бұрын
What a wonderful video with clear explanation! Thanks for making this and sharing with the community.
@Zineb-ru8bp
@Zineb-ru8bp 6 ай бұрын
I was struggling trying to understand Transformers but you make it easy for me. Thank you so much
@user-pz5nn2kg2j
@user-pz5nn2kg2j 4 ай бұрын
The best video explaining the Transformer so clearly I have ever seen. Thanks very much for your efforts. I really appreciate your methods of explaining every steps with a concrete examples and explicitly give the shapes of every matrices that involve. The shapes of matrices in each step are the most confusing part for me to understand Transformer models, and you make it so clear for me. Thanks a lot Umar.
@umarjamilai
@umarjamilai 4 ай бұрын
不客气!你们可以在领英交流
@Stephanfreund
@Stephanfreund 5 ай бұрын
Awesome explanation for those who seek to truly understand the fundamentals of the most important paper of this decade
@richeek10
@richeek10 3 ай бұрын
Such a nice explanation with a soothing voice. Thanks so much!
@nirajdesai
@nirajdesai 2 ай бұрын
Brilliant explanation of basics - thanks for putting this video together!
@aurelagbodoyetin3321
@aurelagbodoyetin3321 6 ай бұрын
This is a masterclass. Thank you for your work
@MichaelJentsch
@MichaelJentsch 6 ай бұрын
Hi, I wanted to express my thanks for your fantastic video. Your clarity and expertise made a complex topic incredibly accessible. Your video has been a meaningful change for me.
@umarjamilai
@umarjamilai 6 ай бұрын
Thank you for your kind words, Michael! Have a nice day
@bsuhaib
@bsuhaib 9 ай бұрын
This is called decoding a transformer. What I really liked was explaining each chunk. That was really helpful for this topic and surely taught me the approach to decode any problem. Jazaakallah ul Khair
@parametaorto
@parametaorto 8 ай бұрын
Hi there! I watched it from start to end and written down all infos in my notebook, it was soooo interesting! Thank you for the explanation!! It was very clear and helpful!
@srikanthvoleti5942
@srikanthvoleti5942 3 ай бұрын
Superb video, the best explanation, I have been trying to understand transformers for a long time and this definitely helped me a lot
@arnonil
@arnonil 5 ай бұрын
Thank you for the excellent introduction. I'm looking forward to your advanced topic videos on Transformers, especially those that include examples of using Transformers for various tasks or scenarios with only the Encoder.
@umarjamilai
@umarjamilai 5 ай бұрын
You should watch my explanation about BERT then ;-)
@koko-wf8vz
@koko-wf8vz 7 ай бұрын
Thank you so much for this video, hands on the best in depth video i have seen. I love the graphical explanations, it helps to visualize matrixes for a math noob :) much love
@somdubey5436
@somdubey5436 4 ай бұрын
you have put such a hard work to explain it so clearly.....hats off to you :)
@LinhLe-we7bw
@LinhLe-we7bw 8 ай бұрын
Thank you so much, Guy. In my mind, this video is very clearly explain. It help me undertand complex term in Transformer
@brothachris
@brothachris 10 ай бұрын
Excellent tutorial! Please keep up the great work.
@xue8888
@xue8888 Жыл бұрын
Thank you man, you are amazing. Keep it up ❤ good luck, I have fingers crossed for your success
Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!
36:15
StatQuest with Josh Starmer
Рет қаралды 572 М.
La final estuvo difícil
00:34
Juan De Dios Pantoja
Рет қаралды 27 МЛН
100❤️
00:19
Nonomen ノノメン
Рет қаралды 37 МЛН
Can You Draw The PERFECT Circle?
00:57
Stokes Twins
Рет қаралды 95 МЛН
The math behind Attention: Keys, Queries, and Values matrices
36:16
Serrano.Academy
Рет қаралды 201 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 225 М.
Transformers, explained: Understand the model behind ChatGPT
24:07
Leon Petrou
Рет қаралды 4,3 М.
This is why Deep Learning is really weird.
2:06:38
Machine Learning Street Talk
Рет қаралды 337 М.
The Attention Mechanism in Large Language Models
21:02
Serrano.Academy
Рет қаралды 77 М.
The KV Cache: Memory Usage in Transformers
8:33
Efficient NLP
Рет қаралды 27 М.
Transformers, explained: Understand the model behind GPT, BERT, and T5
9:11
XL-Power Best For Audio Call 📞 Mobile 📱
0:42
Tech Official
Рет қаралды 772 М.
#miniphone
0:18
Miniphone
Рет қаралды 11 МЛН
Дени против умной колонки😁
0:40
Deni & Mani
Рет қаралды 9 МЛН
POCO F6 PRO - ЛУЧШИЙ POCO НА ДАННЫЙ МОМЕНТ!
18:51