The Transformer neural network architecture EXPLAINED. “Attention is all you need”

Рет қаралды 87,958

Күн бұрын

Пікірлер: 102

@AICoffeeBreak 7 ай бұрын

🛑🪧Our remastered version of this video: kzbin.info/www/bejne/m5SceoSDnq91ntU 🪧 containing more about attention keys, queries and values!

@kevlyn24 3 жыл бұрын

This is my favourite video on transformers. Everytime I want to revise the architecture quickly, I come here!

@AICoffeeBreak 3 жыл бұрын

Wow, this is huge and means a lot to us. Thanks, happy this helps.

@airepublic9864 2 жыл бұрын

Residual conevtion. I just understood why they sum those layers, thx so the main information don't lose through the process

@staffankonstholm3506 3 жыл бұрын

This is my favorite video on transformers

@AICoffeeBreak 3 жыл бұрын

Glad you like it this much!

@mgfos207 3 жыл бұрын

I agree with the general consensus in the comments your explanation of transformers has been the best I've seen. Subbed!!

@AICoffeeBreak 3 жыл бұрын

Reading this comment makes Ms. Coffee Bean so happy!

@floriankowarsch8682 3 жыл бұрын

Thanks for your contribution! I think I just have to binge watch all your videos before I can do anything else today.

@AICoffeeBreak 3 жыл бұрын

Haha, so happy to hear this! Luckily, the videos are short and not that many (yet).

@Katurha 4 жыл бұрын

I've trying to warp my head around this architecture for a loooong time.. Thanks Coffee Bean !

@AICoffeeBreak 4 жыл бұрын

So happy, I made a difference! -- Coffee Bean

@DrFerencAcs 3 жыл бұрын

Finally got what is so special about this architecture. Thank you!

@AICoffeeBreak 3 жыл бұрын

Our pleasure! :)

@pranavkushare6788 3 жыл бұрын

This is best explained video on transformers,, i have ever watched !!! Amazing work . Plz keep posting videos on such topics.

@AICoffeeBreak 3 жыл бұрын

Thank you! Ms. Coffee Bean will try her best!

@lgbpinho 3 жыл бұрын

By far the best explanation on transformers. I was actually on a coffee break O_o.

@AICoffeeBreak 3 жыл бұрын

Watching AI Coffee Break while working. 😀 Watching AI Coffee Break while on a coffee break: 🤯😱 Thanks for leaving this comment! Happy you found this video useful!

@vaishali_ramsri 4 жыл бұрын

Excellent explanations! Please keep the transformer and other state-of-the-art NLP videos coming ! Great quality.

@AICoffeeBreak 4 жыл бұрын

What OTHER state-of-the-art in NLP these days? :D Just kidding, but there is some truth in my question, since the Transformer has taken all spotlights nowadays.

@vaishali_ramsri 4 жыл бұрын

@@AICoffeeBreak Agreed! I meant Multimodals, Google BiT etc. Also, high-level differences between T5, BART, GPT-2, and which kind of tasks each excel in would be very helpful to know! Thanks for making these videos. Just watched multimodal video and it was extremely useful.

@AICoffeeBreak 4 жыл бұрын

@@vaishali_ramsri Thanks, these ideas are very interesting for me too! I will add these to my (already very long) list of ideas for videos.

@sfdv1147 3 жыл бұрын

SUBBED! I honestly enjoy this more than my favourite video game youtube videos

@AICoffeeBreak 3 жыл бұрын

Haha, thanks! Imagine what would happen if Ms. Coffee Bean would start explaining ML concepts while gaming! 😱

@sfdv1147 3 жыл бұрын

@@AICoffeeBreak that would be the only live stream that I would pay money to watch haha

@aflah7572 3 жыл бұрын

Awesome! Thanks for this great video

@AICoffeeBreak 3 жыл бұрын

Our pleasure! Thanks for watching. 👍

@abail7010 3 жыл бұрын

Thank you very much for this easy explanation of the complex topic! :)

@Adarsh_R 3 жыл бұрын

Very good explanation

@AICoffeeBreak 3 жыл бұрын

Thanks for appreciating. It is one of my earliest videos. 😅 I should do a remastered version of this.

@Adarsh_R 3 жыл бұрын

@@AICoffeeBreak Great keep going . I am newbie to AI world and not a computer science person. Your videos are really awesome and easy to understand 😊

@prachigupta4610 3 жыл бұрын

Very nice explanation, easy to follow even for someone with almost no background in NLP

@AICoffeeBreak 3 жыл бұрын

Thanks, that was exactly the point! 😃

@ababoo99 2 жыл бұрын

Great video. I really Like how you put the aspects of transformers in practical context.

@AICoffeeBreak 2 жыл бұрын

Thank you very much!

@saminchowdhury7995 2 жыл бұрын

Such an incredible explanation. Are you using Manim? its so smooth

@AICoffeeBreak 2 жыл бұрын

Thanks! No manim, just PowerPoint's morph functionality. 😅

@AICoffeeBreak 2 жыл бұрын

Glad to see you around, hope you find something worth wasting your time. 👍

@ayehninnkhine4483 4 жыл бұрын

Your explanation is very clear and precise. It helps me a lot. Thank you so much.

@tejasdhagawkar1468 4 жыл бұрын

Best explanation everrrr.. Thank you soo much.

@AICoffeeBreak 4 жыл бұрын

You're very welcome! Glad it helped!

@lookas2995 Жыл бұрын

Easy to understand and good for beginers. Thanks for your high quality video :P

@AICoffeeBreak Жыл бұрын

Thanks for passing by and leaving this nice comment! ☺️

@gregd6022 2 жыл бұрын

Holy Smokes... you're videos are clear and entertaining.. wow, other could learn from you. The "pseudo" bossa nova at the end helps too, me having a Brasil connection ;))

@flossetti3246 2 жыл бұрын

Best explanation i've seen thank you. Subbed.

@TenzinT 2 жыл бұрын

Very good explanation. Thanks!

@dinasaif9610 3 жыл бұрын

Thanks soooooooooooo much

@AICoffeeBreak 3 жыл бұрын

Welcome 😊

@nerisozen9029 3 жыл бұрын

Thanks! Best explanation on transformers I've seen so far, but I think I would have liked it if you dwelled more on the generation of the output and encoder-decoder attention. Cheers!

@AICoffeeBreak 3 жыл бұрын

Great suggestion! I am considering to make Transformer Explained Remastered version. But I did not find the time yet...

@nerisozen9029 3 жыл бұрын

@@AICoffeeBreak Thank you for the response! Best of luck

@thiswasme5452 3 жыл бұрын

Thank you mam for this beautiful explanation! Subscribed

@LowestofheDead 2 жыл бұрын

Subscribed because this channel keeps giving top quality explanations. Also I just realised the eyes on the coffee bean are shaped like a coffee bean 🤣

@AICoffeeBreak 2 жыл бұрын

What? I did not realize the CB eyes are shaped like beans. They are just circles. 🤣

@mak9856 3 жыл бұрын

Excellent explanation ! Thank You

@AICoffeeBreak 3 жыл бұрын

Ms. Coffee Bean is so glad it was helpful!

@dipankarrahuldey6249 3 жыл бұрын

Please keep up this amazing work, clearly explained. And a request, if possible please add your all relevant videos in a playlist, so that it can be followed easily(I found a playlist though). Thanks a ton

@AICoffeeBreak 3 жыл бұрын

Hey, thanks for the suggestion! Do I understand correctly that the existing playlists are not enough? :)

@dipankarrahuldey6249 3 жыл бұрын

@@AICoffeeBreak No just wanted to say whenever you'll add a new video, please try to add it to the playlist. You've playlist has got great contents. Thanks again

@AICoffeeBreak 3 жыл бұрын

Thanks for clarifying! I usually add things to the playlists if they belong together.

@mikemihay 4 жыл бұрын

Excellent video!

@ManikandtanCK 3 жыл бұрын

I am not sure if you already made a video on this, but I would love to see you taking two architecture paradigms like CNNs and Transformers and explain the differences, the reason why one works better than the other in what situations etc.

@AICoffeeBreak 3 жыл бұрын

Is this what you mean? kzbin.info/www/bejne/l3mapGmnjaqImcU Or this? kzbin.info/www/bejne/eofSeamjrNxlorM

@KeinNiemand 2 жыл бұрын

I wonder what the next big neural network architecture is going to be, what comes next after transformers?

@AICoffeeBreak 7 ай бұрын

Maybe RNNs such as LSTMs and State Space Models (we did an explainer here: kzbin.info/www/bejne/rKOpZICqfNx3Zrs )

@user-or7ji5hv8y 4 жыл бұрын

The high level is really helpful but I think a video that explains the details with some intuition could be really useful as well.

@AICoffeeBreak 4 жыл бұрын

Noted. 😀

@bertchristiaens6355 4 жыл бұрын

Very nice explanation :p

@AICoffeeBreak 4 жыл бұрын

Glad you think so!

@kypy3900 4 жыл бұрын

In sfarsit inteleg si eu arhitectura asta :D. Multumesc.

@AICoffeeBreak 4 жыл бұрын

Mă bucur că a ajutat pe cineva acest video!

@goelnikhils Жыл бұрын

Hi Letitia , Thanks for such great content. One question I have - When we use Transformers Encoder to encode any sequence to generate embeddings what loss function does transformer uses. For e.g. I am using Transformer Encoder to encode a sequence of user actions in a user session to generate embeddings to be used in my recommender system. Kindly answer

@AICoffeeBreak Жыл бұрын

Hi! To predict the correct word, from all possible word over the (English) vocabulary, the model uses cross-entropy loss. But you can change the loss to adapt it to your problem.

@GradientDude 4 жыл бұрын

Thanks Letitia! Very nice animations. I wonder which software do you use to draw them?

@AICoffeeBreak 4 жыл бұрын

For the graphics and content animations I use Powerpoint, for animating Ms. Coffee Bean and video editing I use kdenlive, for drawing her I use Adobe Photoshop.

@GradientDude 4 жыл бұрын

@@AICoffeeBreak impressive! Do you film the Powerpoint presentation with OBS?

@AICoffeeBreak 4 жыл бұрын

@@GradientDude yes, OBS it is! Forgot to mention.

@liammcfadden7760 4 жыл бұрын

What is the song at the end? I like your end musics...

@AICoffeeBreak 4 жыл бұрын

Especially this end music is one of my favorites too! It is called "Malandragem" and you can find it in the KZbin Audio Library (under the KZbin Audio Library license).

@fashion2284 2 жыл бұрын

niiiiiiiiice love you

@zebcode Жыл бұрын

Gow can all the word embeddings are processed in parallel? Each sentence is a different length?

@ChengyiLi-t1k 4 ай бұрын

This was very well explained. Thank you so much for this. I couldn't help but wonder since these models are trained by data, how much data do you need for a model to be reasonably accurate? A quick google search told me that GPT4.0 is trained by approximately 500,000,000,000 pages of text, which is absolutely insane to me! I want to know if there are models that we can develop that train based on less data but still provide accurate results and what do these models look like?

@AICoffeeBreak 4 ай бұрын

Thanks a lot, especially since this is a very old video. We have made a new transformer explainer: kzbin.info/www/bejne/m5SceoSDnq91ntU About your question: Unfortunately in deep learning, great performance comes with big data, because the models work only well in the domains and kind of data they have seen so far (in distribution). And the motto is: nothing will be out of distribution if we include the entire world in the training data, no? (this is a tongue-in-cheek comment, just flagging it). 😅 So, if you are willing to sacrifice a lot of performance, then there are models that can work with less data, going back to older NLP, based on word embeddings and tf-idf representations. But I cannot say more until I know your specific use case. But if you want a chatbot that can talk about almost anything, then you need trillions of tokens of text, at least this is what we learned from ChatGPT et al.

@ChengyiLi-t1k 4 ай бұрын

Oh wow I didn't even realize I was on the older video I will definitely check out the new one, and thanks for your answer! The motivation for my question was since we typically don't have a lot of data on endangered languages, could there be language models that can produce helpful results in these languages despite the lack of data on them. I guess the broader question would be what kinds of language models could we apply to endangered languages for things such as documenting them or aiding in that kind of research?

@AICoffeeBreak 4 ай бұрын

@@ChengyiLi-t1k I'm not an expert in multilingual AI, but I have heard from experts there. Your question reminds me of two points. * In multilingual AI, people still try to scrape all the monolingual data they have, automatically produce back translations and then train a multilingual model that hopefully can transfer its knowledge from high resource languages, to the low resource one. But you need some decent amount of data from every language you aim to learn. We've made a video on this approach, find the link in the description. kzbin.info/www/bejne/Z5irhpyEgb6UaJI * If you have a very powerful model, of the class of GPT-4, Gemini, et al, then you could hope that the representations that exist are strong enough to elicit them with few-shot prompting. So, if you have the context length of Gemini, of multiple million input tokens, then you can many-shot a language from scratch by feeding in its dictionary and a grammar book. This is what Gemini 1.5 did for Kalamang: www.reddit.com/r/singularity/comments/1arla9z/gemini_15_pro_can_learn_to_zero_shot_translate_of/ It was meant as a out-of-distribution test, because the authors were sure that there is no trace of Kalamang on the internet that Gemini was trained on.

@ChengyiLi-t1k 4 ай бұрын

@@AICoffeeBreak Thank you very much! I appreciate the response.

@osuregraz 3 жыл бұрын

Neat explanation however after watching video I still don’t understand how multihead attention and encoder-decoder attention work which are two important concepts I need to know

@AICoffeeBreak 3 жыл бұрын

You're right. We plan a "Transformer remastered" video because sure, we did not get to the most technical points and to the math formulas. You might want to check out Yannic's video on it: He stays very close to the paper. kzbin.info/www/bejne/n3XYnZulhpejqNE

@bleacherz7503 Жыл бұрын

Why are word vectors of length 512?

@AICoffeeBreak Жыл бұрын

That's their length in BERT base. They can be longer depending on how much memory available one has and how long one is willing to wait for a pass through the network. The longer the vectors, the slower the transformer.

@zbynekba 2 жыл бұрын

Hi Letitia, great explanation, thank you. At 08:14 you have mentioned the Encoder-decoder attention. That’s one of the key locations where magic happens. Could you elaborate on it? Or maybe you you have done already in some other of your videos?

@mak9856 3 жыл бұрын

Hi , i still dont understand what exactly are the; query, Keys and values in the paper: attention is all you need, can u help?

@AICoffeeBreak 3 жыл бұрын

Ms. Coffee Bean is planning a follow-up video in Transformer going into more details. Right now the concepts you ask about were out of the scope of this explanation. Stay tuned!

@chainonsmanquants1630 2 жыл бұрын

best video i found on transformers. i still don't know what are the query key and values in the decoder parr

@AICoffeeBreak 7 ай бұрын

Maybe our remastered version of this video might help kzbin.info/www/bejne/m5SceoSDnq91ntU . We did it mainly because of your feedback, thanks a lot!

@PasseScience 2 жыл бұрын

Hi, thanks for the video! There are several things that are still unclear to me. First I do not understand well how the architecture is dynamic with respect to the size of the input. I mean what does change structurally when we change the size of the input, are there some inner parts that should be parallely repeated? or does this architecture fix a size of max window that we hope will be larger than any sequence input? The other question is the most important one, it seems every explanation of transformer architecture I have found so far focuses one what we WANT a self attention or attention layer to do but never say a word of WHY after training those attention layers will do, by emergence, what we expect them to do. I guess it has something to do with the chosen structure of data in input and output of those layers, as well as the data flow which is forced but I do not have yet the revelation. If you could help me with those, that would be great!

@v1hana350 2 жыл бұрын

What is the meaning of fine-tuning and Pre-trained in Transformers?

@СергейЛис-с7т 2 жыл бұрын

video anda sangat bagus dan mempunyai mesej yang luas terima kasih

@hamedghazikhani9775 3 жыл бұрын

Thank you, it was very good. I hope you can improve your accent and speak faster a bit. Again thanks.

@d3v487 3 жыл бұрын

Amazing love it 💗. Very intuitive Explanation. Please speak slow :)

@AICoffeeBreak 3 жыл бұрын

Thank you, I will try my best concerning the verbal pace! 😊 Does turning the captions on and setting the speed of the video to 0.75x maybe help you? The captions could help a lot, I have uploaded them myself => they are not automatically generated. Even if they were, the Algorithm got pretty good at speech-to-text.