Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

Рет қаралды 404,823

Umar Jamil

Күн бұрын

Пікірлер: 677

@umarjamilai Жыл бұрын

Slides' PDF: github.com/hkproj/transformer-from-scratch-notes

@bhaskartripathi 10 ай бұрын

I am not able to download the pdf file. My friends also tried. Will it be possible to put it on a downloadable link please? your content is too good and needs to be read again and again.

@mahek6110 9 ай бұрын

its getting downloaded@@bhaskartripathi

@gabrielnsionu8583 11 ай бұрын

This is arguably the best explaination of the multi-head attention in the internet hands down. Very thorough and most important to folks like me using attention mechanism as my underpinning mechanism in developing my novel neural architecture to be applied to my deep reinforcement learning architecture. Sir, pls never stop making this type of videos.

@umarjamilai 11 ай бұрын

You're welcome! 🤓

@csikel22 11 ай бұрын

I couldn't agree more. Best video on transformers I have seen so far. I doesn't get clearer than this. It would be very interesting to give some insight why this whole thing works and what are other variations and alternative architectures.

@rkbshiva 11 ай бұрын

@@umarjamilaibro you're a legend!!!!

@pablofe123 6 ай бұрын

There are still a couple of things that are not explained well in the video. Q, K and V matrixs are the same matrix? and where do the parameters matrix Wq, Wk and Wv comes from? Besides that, excellent video.

@peregudovoleg 6 ай бұрын

@@pablofe123 21:25 "QKV are the same matreces". As for W matrices, he only says that they are "parameter matrices", and parameters is something we train during training process.

@DembaDiop-om3gv 10 ай бұрын

The best explanation of "Attention is all you need" from my point of view, guys "This explanation is all you need". Thank you very much

@utkarshashinde9167 7 ай бұрын

I cannot tell you how grateful I am for this explanation provided by you .............. nowhere I find this detailed and easy-to-understand description, a go-to video for every interview preparing students

@JulianHarris 5 ай бұрын

I'm so glad I found this again. Do NOT rely on KZbin watch history it doesn't look at all your history. This is definitely the best explanation of transformers and attention and believe me I've watched quite a few! Kudos again Umar.

@umarjamilai 5 ай бұрын

You should subscribe to the channel to never lose it 😇 thanks for the kind words.

@hackie321 5 ай бұрын

The best Transformer explanation on internet till now and I have seen almost all of it. Kudos! You are a true teacher. I dare to compare you with Andrew NG. Please become a professor and not a corporate slave.

@laodrofotic7713 4 ай бұрын

I think Dr. Umar Jamil is way better than Andrew NG, and I did his courses and think he is great too, but this person is way better.

@uditapatel778 Ай бұрын

Way better than Andrew NG for sure at least for my learning style. Prof Andrew is great too though.

@sushantpenshanwar8038 Жыл бұрын

You did the best job of describing the complicated details in a fluid manner. Sat, watched and took notes in one sitting. Hands down best one so far.

@lucasmolter1040 24 күн бұрын

One cannot say it for sure because there is an infinite amount of explanations on KZbin... but I can say that this is the best I have seen. Congrats for the great quality and congrats for all the effort that you clearly put into the material.

@sinaabdi8033 3 ай бұрын

I have read and watched a lot to understand the Transformer architecture. However, this is the best one of them so far. Nobody went to this level of minute details as you went. Thank you. Please keep it up.

@xray788 Ай бұрын

These kinds of videos just makes MIT videos look like rookies. Thank you Umar, may God bless you.

@ChadieRahimian 9 күн бұрын

Best explanation of the paper on KZbin. I love your style which is tailored for people who know the basics on an academic level. It’s like sitting in a really good graduate level course at university. You are such a good teacher!

@saravanannatarajan6515 3 ай бұрын

What a gem of a video! I would request people to read the paper and then come back here so that you will understand the value we get from the instructor. Awesome work, keep it up!

@hamzaomari7052 7 ай бұрын

This is the best explanation, it took me 4 hours, to take notes and revise stuff, and going with you word by word, with intuitions, and now I feel that I truly understand the transformer architecture and the mathematical intuition behind every detail. A thing that you cannot find in any other video. Thank you so much sir, this is very instructif and helpful.

@barretvermilion6359 2 ай бұрын

Your video has clarified and tied together the missing pieces from reading papers and watching other videos, and is the best explanation I've seen. My background is in psychology and psychometrics, so learning tranformer architectures for my dissertation has been a slog - but you've saved me a lot of time wasted on confusing explanations. Thank you so much!

@rajatadimeti2398 2 ай бұрын

This is one of the best, compact, precise explanation of transformer architecture that I could find on KZbin. Thanks for all the effort you have put.

@nabanitadash7085 2 ай бұрын

I have been religiously watching your videos and it has helped me understand difficult papers so smoothly. Kudos 👏 you are doing a great job. It feels like you are the next Andrej Karpathy.

@NJCLM 9 ай бұрын

This video is surely among the top 3 among the 50 videos that I watched to understand this subject. We are very grateful to you, keep the energy, KZbin numbers will follow !

@marsupilami125 8 ай бұрын

Can you tell me the other 2?🙏

@snehotoshbanerjee1938 5 ай бұрын

Umar, you are a great teacher. I have not seen such a great explanation of transformer. Your transformer from scratch coding is also awesome. So, basically you understand which part needs more explanation. Thanks for your effort.

@ajithshenoy5566 11 ай бұрын

Bless you Umar One of the finest tutorials out there. Please don't ever stop. We're willing to support you in every way possible.

@Jafar801 Ай бұрын

You deserve a larger following and more recognition in the ML community.

@BowenXie-b7b 9 ай бұрын

The best video explaining the Transformer so clearly I have ever seen. Thanks very much for your efforts. I really appreciate your methods of explaining every steps with a concrete examples and explicitly give the shapes of every matrices that involve. The shapes of matrices in each step are the most confusing part for me to understand Transformer models, and you make it so clear for me. Thanks a lot Umar.

@umarjamilai 9 ай бұрын

不客气！你们可以在领英交流

@keithchua1723 7 ай бұрын

Spent days trying to understand this and I wished I had come across this video first because now I understand everything fully. Immediately subscribed, keep it up!!

@rachadlakis1 4 ай бұрын

Wow, this is an incredibly detailed explanation of the Transformer Model! Thank you for sharing all the insights and resources. Understanding the layers and processes involved is crucial for anyone working with this model. Keep up the great work!

@jamesmina7258 3 ай бұрын

the best laid out presentation of Transformers, thank you Umar Jamil🥰

@laodrofotic7713 4 ай бұрын

I must say it started off a bit bad when you started writing with the red stick, I almost tuned out. Turns out I have to agree this is the best explanation of self attention I have seen on youtube, congratulations, this is really good and properly explained, specially the QKV

@_seeker423 8 ай бұрын

The clearest explanation of a very important breakthrough paper that I have seen on KZbin. Thank you!

@_seeker423 8 ай бұрын

One thing that I felt was missing is the logical explanation of what is the role of value vector (V).

@keviny2 5 ай бұрын

Thanks Umar for the amazing video. This is the most comprehensive yet understandable walkthrough of the transformer architecture that I came across. Super helpful. I feel like I have a good foundation for tackling more complex LLMs because of it.

@mail2say 2 ай бұрын

Very clear, precise explanation! Went through many articles and videos, but was never clear of concept. Well thought-out presentation. Now eager to go through your other videos. 👍

@bsuhaib Жыл бұрын

This is called decoding a transformer. What I really liked was explaining each chunk. That was really helpful for this topic and surely taught me the approach to decode any problem. Jazaakallah ul Khair

@mculabs 10 ай бұрын

Probably the best explanation of the paper and the encoder and decoder sub layers. Kudos!!

@Udayanverma Жыл бұрын

I would understand much deeper with your explanation. Rest of the world is scarying with diagrams and tables without explaining practical implementation. thank you dear!

@parthvadera1 3 ай бұрын

I love the way you’ve explained it using matrices. Had some doubts after watching Andrej’s video, this clears it. Thank you so much!

@abhilashbalachandran7160 Жыл бұрын

super useful. I really loved how you explain this with linear algebra. Very insightful. actually easier to understand than a lot of lectures at universities

@vrvlbl 9 ай бұрын

Amazing explanation. I struggled too long to understand the architecture until I landed on your video. Way to go!!

@lethnisoff 6 ай бұрын

Finally, after a lot of articles and videos i found a video a could understand. Thank you, sir. I am not strong in math but i think i understood a lot with this explanation

@ddstar 9 ай бұрын

Excellent. You answered a lot of questions I had about where the weights come from and how they were updated

@huseyngorbani6544 Жыл бұрын

This video is hands down the best explanation I've come across so far! The level of detail provided is fantastic, but if there's one aspect I'd love to delve deeper into, it's the normalization part. It would be incredibly helpful if you could expand on that topic a bit more. Furthermore, I'm quite curious about the process of weight learning. With so many weights involved, such as those for Q, K, V, and the fully connected layer, as well as the weights in the decoder part, understanding how they are learned would be immensely valuable. If you have any recommended resources or links that explain this aspect, I would greatly appreciate it. Thanks again for the amazing content!

@umarjamilai Жыл бұрын

Hi @huseyngorbani6544 The process of weights learning is determined exclusively by the back-propagation algorithm. Since it's a fundamental algorithm in machine learning, I will make a video on how it works and how to write an autograd system from scratch, so that anyone, even with little maths background, can understand it. As you know making videos, especially when it's not your source of income, is very difficult. I try to make high quality content and for free, not only for my own personal pleasure in teaching, but especially for helping others struggling to enter this magical world called AI. Have faith and I'll make try to satisfy everyone's requests. Have a wonderful day with your family, friends, pets (and VS code)!

@huseyngorbani6544 Жыл бұрын

@@umarjamilai Oh understood. Thank you.

@smartwakeAI Жыл бұрын

@@umarjamilai Thanks for being such a genuine human being. Being extraordinary smart and remaining humble at the same time is a difficult challenge that most highly intelligent people seem to fail. I am fairly new to AI and I loved your video! Thanks for making those videos! They are super helpful!

@183lucrido_ase Жыл бұрын

@@umarjamilaiyou make it to help us and it works, thank you.

@calewang3713 Жыл бұрын

Oh Man, you deserve a Turing Award.....

@gauravmalik3911 8 ай бұрын

Detailed explanation, did great work on explaining difficult topic by dividing in chunks, I don't think any part is missed in explanation. Best Explanation

@kerrykilian9127 5 ай бұрын

best explanation of the paper on the whole internet

@silasnginyo7744 11 ай бұрын

So far the best laid out presentation of Transformers I have ever walked through

@Patrick-wn6uj 7 ай бұрын

This is the most important channel I have come across on youtube. keep creating these long form videos you are saving our lives in a huge away

@megatroneata9911 9 ай бұрын

After watching this video and the stable diffusion video, I can say forsure that you are an amazing teacher. Extremely digestible content and easy to follow along.

@shawkontzu642 11 ай бұрын

Super on the explaining the differences between training and inferencing, that clears my confusion also in "time step = 1"

@umarjamilai 11 ай бұрын

That was the biggest source of confusion for myself as well. Glad it helped.

@tgyawali 11 ай бұрын

Thank you, so much for putting together such a detailed video. This helps technical people who do not have a lot of experience in research but have some background in machine learning to understand this very important and historic paper in AI.

@NazerkeSafina 7 ай бұрын

This is brilliant. Thank you Umar for your hard work. Please keep new videos coming. You are helping immensely. May you live long and happy and healthy

@blacksword06 13 күн бұрын

the best explanation I have ever seen about transformer architecture. Thanks a lot.

@abdulmajid8731 10 ай бұрын

It would be harsh if not rated on top. Absolutely the best explaination so far around the 'world'. Thanks Umar for your efforts. Keep the good work up.

@haoming3430 7 ай бұрын

Your video is very helpful and easy to follow. I have to say this is the best tutorial about transformer I've seen.

@andreicristea997 Жыл бұрын

Finally the fancy "black box" called transformer became more understandable for me. Really interested in the other content you are making. Thanks for the explanation.

@Stephanfreund 10 ай бұрын

Awesome explanation for those who seek to truly understand the fundamentals of the most important paper of this decade

@sedthh Жыл бұрын

Thank you, this was really helpful! One minor correction: the LayerNorm does not normalize to a 0-1 range rather it standardizes to 0 mean with unit variance.

@umarjamilai Жыл бұрын

You're right! Thanks for pointing out.

@nithinma8697 Ай бұрын

This is the best Explanation I have ever come across about Transfomers. Thank you For sharing. Expecting more such Quality Contents😊😊😊😊😊

@MichaelJentsch 11 ай бұрын

Hi, I wanted to express my thanks for your fantastic video. Your clarity and expertise made a complex topic incredibly accessible. Your video has been a meaningful change for me.

@umarjamilai 11 ай бұрын

Thank you for your kind words, Michael! Have a nice day

@ameyadesai6382 Жыл бұрын

The best explanation on this paper, can't wait to see the other videos on this topic.

@abc-by1kb Жыл бұрын

Such a great video! Explained all the key concepts so clearly and precisely while giving very nice intuition!

@fransvanbuul3098 Жыл бұрын

Thank you so much, Sir. This was fantastic. I tried to work through the paper on my own, but not being an expert in the field, it was too dense for me to get through. I tried to find other resources to explain it to me, but they all seemed to stop short of really understanding it. Reading the paper again after your clear explanation, I finally think I understand most of it.

@umarjamilai Жыл бұрын

You're welcome! Always makes me happy to know I've helped somebody.

@SagarVibhute 11 ай бұрын

Kudos on the commendable work, and simplified explanation! I appreciate that you are also trying to explain the intuition behind each step and not just math. I'll view and re-view this a few times to understand more with successive passes. Thank you!

@shuchenwu170 7 ай бұрын

This tutorial translates complex and terse structures into intuitions. A masterpiece of tutorials!

@atulsain6170 Жыл бұрын

Wow.. Thank you so much.❤ I was banging my head in different papers, books, and videos for the last two days. Its the best explanation I could find.

@umarjamilai Жыл бұрын

Thanks! You should watch my other video on how to code the Transformer from scratch, that will also give you practical experience.

@AIVidya 11 ай бұрын

One of the best transforrmers videos encountered so far.

@Nereus22 10 ай бұрын

This is really a great video, exactly what I was searching for! Everything that you mentionned was explained in details (others are skipping a lot).

@limpub 5 күн бұрын

❤ Best explanation of transformers. Thanks you so much! 最清晰的transformer讲解，非常感谢！

@AbhinavSharma-dc3kv 6 ай бұрын

the best explanation for attention architecture. kudos to you sir!

@brunogatti383 6 ай бұрын

Best video for attention mechanism hands down

@ariffahla482 10 ай бұрын

I seldom comment in a youtube video.. but this is just too good to pass. Thank you Umar for your relatively easy and comprehensible video on such a complex subject. It helps me a lot! You are awesome!

@umarjamilai 10 ай бұрын

You're welcome!

@ayeshanaikodi6764 11 күн бұрын

My favorite explanation of Transformer model! Thank you!

@ActualCode0 11 ай бұрын

I like how u used examples and drew out the matrices to show what was going on in the attention block. It rly helped me understand the concept better

@Zineb-ru8bp 11 ай бұрын

I was struggling trying to understand Transformers but you make it easy for me. Thank you so much

@yuk-hoiyiu7023 9 ай бұрын

The only video that explains the difference between training and inference in the Transformer model!

@ishaanjoshi6959 10 ай бұрын

The best explanation of attention based mechanism I found online , thank you so much Umar for making this video.

@peregudovoleg 6 ай бұрын

38:56 normalizing in this case through layer norm, doesn't squish our values between [0, 1], but rather transforms them to have mean=0 and std=1. I know, a bit confusing, some papers mix normalization/standartization. For [0.1] the formula is different: x = (x - x_min) / (x_max - x_min). Just asked GPT, it said - "it is a DS thing to call standardization a normalization." Great video non the less. I try to rewatch it every now and then just because it is so good and helps to visualize everything.

@hugopristauz538 Жыл бұрын

good job - your single stepping (with remarking) is really helpful

@aeigreen Жыл бұрын

great explanation. thank you for demistify trasformer. I have come to your explantion after watching countless videos on transformer, your explanation is simply the best.

@MaheshFernando-h5u 4 ай бұрын

this is the Best explanation that i saw from all the resourses including even paid coursera courses.❤❤

@aanchalmahajan3821 4 ай бұрын

This is the video that I was looking for which provides the best explanation covering each and every detail of the architecture. After wasting a lot of time I just got ur video which cleared my doubts. Thanks a lot for such video and do share your courses if any on the udemy business platform. Would be an honor to study 😊😊

@Allinone-wi2un 4 ай бұрын

i really thankful to you for your dedication on conveying quality data from scratch and explaining is so clean without any misconfigure and last thing to ask is i am glad if you provide the google colab notebook link

@dalilabdouraman3557 10 ай бұрын

Definetely the best explanation of the mutli head attention with the transformer ...just awesome

@atrijpaul4009 10 ай бұрын

Best explanation of Attention throughout KZbin!!!!! Thank you sir for making this video and helping us..

@arnonil 10 ай бұрын

Thank you for the excellent introduction. I'm looking forward to your advanced topic videos on Transformers, especially those that include examples of using Transformers for various tasks or scenarios with only the Encoder.

@umarjamilai 10 ай бұрын

You should watch my explanation about BERT then ;-)

@tariqkhan1518 6 ай бұрын

TBH The best Explanation of Attention in whole Internet.

@skc909887u Жыл бұрын

This is the best explanation for an engineer for sure .love this

@vitoroliveiradesouza4214 5 ай бұрын

I'm really glad to have found your video! Congratulations on the clean and yet detailed explanation

@zeeshanmehdi3994 8 ай бұрын

can't thank you enough, this is the best explanation of transformers i could find after trying for days to understand it. Thank you ❤

@lyte69 Жыл бұрын

Thank you for your great explanation and effort, this was very informative and honestly there are no problems with the video, it's only a preference for me if there was some code alongside each part explained so it's even better understood, but I want you to know that this was a huge help thank you again. ❤

@stephenlashley6313 8 ай бұрын

Excellent! You should write a book on Transformers with the intuition offered in the video

@cristinaballesteros93 8 ай бұрын

I have watched a lot of videos about transformers, and this is by far the best one. I finally understand how they work. Thank you so much!

@rajkrishnamurthy8474 Жыл бұрын

Love it Umar. This is the best explanation of the paper. Thank you very much.

@scorpionobrien7633 Жыл бұрын

dear, i'm a french student and i don't uderstand very well english but this video is awesome. I never seen a video that explain also good this concept of transformers and i want to tell you big thank for this. sorry for my english

@umarjamilai Жыл бұрын

De rien mon ami :D

@saima6759 7 ай бұрын

transformer model never got so clear to me! thank you Umar!

@vassilisworld Жыл бұрын

amazing tutorial Umar. I finally understood the transformer.

@70152136 10 ай бұрын

your presentation skill are simply amazing!!! best video on transformers I've seen so far

@subinaypanda9936 5 ай бұрын

Your explanation just hits my mind. You explained all the points, where I was facing problems to understand. It's like you can read my mind from past huh 😜. Yes subscribed.

@sanskargupta7085 5 ай бұрын

I feel lucky enough to have come across this channel, amazing stuff!

@laodrofotic7713 4 ай бұрын

THis is amazing! I just finished the whole video and I trully understand it now, thank you Umar Jamil, you are the greatest!!!!!!!!!!!!

@parametaorto Жыл бұрын

Hi there! I watched it from start to end and written down all infos in my notebook, it was soooo interesting! Thank you for the explanation!! It was very clear and helpful!

@ThanhNguyen-h6r7o 11 ай бұрын

Excellent video. I realized that there are thousands of videos trying to explain the Transformer, however, this is one of the most pedagogy one, probably the reason is Umar is a very experienced instructor. Coming from the same instructor background, I believe that it will be much more easy for audience to understand the Transformer by highlighting the two most critical points: 1. What is the data structure used to store learnt parameters (weights, bias) after training, i.e. in this case nn.Embedding Correct me if I am wrong. At the end of the training, there must be an array/tensor nn.Embedding with v (vocab_size) x v (vocab_size). This would store all the number values that can tell us about position of the word and the relationships among words. 2. During the inference, we have an input with sequence of word (token) sent to the encoder. The decoder in turn will look up all the individual words in mode (nn.Embedding tensor), then return the most probable next word. All other steps and math formulas are easy to grab but the data structure to store the model is the most critical. Understand that structure, we can understand other parts easily.

@MasterCodeAcademy 11 ай бұрын

Yes, I agreed. Data structure is the most critical point to understand Transformer. All the math is just formula to follow.

@norlesh Жыл бұрын

Hi Umar, great videos - thank you. A suggestion to add for any future coverage of the material or as a follow up to this one would be to contrast how encoder-only (such as BERT) and decoder-only (such as GPT) differ during training and inference from the original encoder-decoder model.

@umarjamilai Жыл бұрын

Hi! I just published two videos on LLaMA, where I show how the Decoder-only model is trained and all the strategies (Greedy, Beam Search, Top K, Top P, Temperature, etc..) for inference. Have a look at them and have a nice day!