🛑🪧Our remastered version of this video: kzbin.info/www/bejne/m5SceoSDnq91ntU 🪧 containing more about attention keys, queries and values!
@kevlyn243 жыл бұрын
This is my favourite video on transformers. Everytime I want to revise the architecture quickly, I come here!
@AICoffeeBreak3 жыл бұрын
Wow, this is huge and means a lot to us. Thanks, happy this helps.
@airepublic98642 жыл бұрын
Residual conevtion. I just understood why they sum those layers, thx so the main information don't lose through the process
@staffankonstholm35063 жыл бұрын
This is my favorite video on transformers
@AICoffeeBreak3 жыл бұрын
Glad you like it this much!
@mgfos2073 жыл бұрын
I agree with the general consensus in the comments your explanation of transformers has been the best I've seen. Subbed!!
@AICoffeeBreak3 жыл бұрын
Reading this comment makes Ms. Coffee Bean so happy!
@floriankowarsch86823 жыл бұрын
Thanks for your contribution! I think I just have to binge watch all your videos before I can do anything else today.
@AICoffeeBreak3 жыл бұрын
Haha, so happy to hear this! Luckily, the videos are short and not that many (yet).
@Katurha4 жыл бұрын
I've trying to warp my head around this architecture for a loooong time.. Thanks Coffee Bean !
@AICoffeeBreak4 жыл бұрын
So happy, I made a difference! -- Coffee Bean
@DrFerencAcs3 жыл бұрын
Finally got what is so special about this architecture. Thank you!
@AICoffeeBreak3 жыл бұрын
Our pleasure! :)
@pranavkushare67883 жыл бұрын
This is best explained video on transformers,, i have ever watched !!! Amazing work . Plz keep posting videos on such topics.
@AICoffeeBreak3 жыл бұрын
Thank you! Ms. Coffee Bean will try her best!
@lgbpinho3 жыл бұрын
By far the best explanation on transformers. I was actually on a coffee break O_o.
@AICoffeeBreak3 жыл бұрын
Watching AI Coffee Break while working. 😀 Watching AI Coffee Break while on a coffee break: 🤯😱 Thanks for leaving this comment! Happy you found this video useful!
@vaishali_ramsri4 жыл бұрын
Excellent explanations! Please keep the transformer and other state-of-the-art NLP videos coming ! Great quality.
@AICoffeeBreak4 жыл бұрын
What OTHER state-of-the-art in NLP these days? :D Just kidding, but there is some truth in my question, since the Transformer has taken all spotlights nowadays.
@vaishali_ramsri4 жыл бұрын
@@AICoffeeBreak Agreed! I meant Multimodals, Google BiT etc. Also, high-level differences between T5, BART, GPT-2, and which kind of tasks each excel in would be very helpful to know! Thanks for making these videos. Just watched multimodal video and it was extremely useful.
@AICoffeeBreak4 жыл бұрын
@@vaishali_ramsri Thanks, these ideas are very interesting for me too! I will add these to my (already very long) list of ideas for videos.
@sfdv11473 жыл бұрын
SUBBED! I honestly enjoy this more than my favourite video game youtube videos
@AICoffeeBreak3 жыл бұрын
Haha, thanks! Imagine what would happen if Ms. Coffee Bean would start explaining ML concepts while gaming! 😱
@sfdv11473 жыл бұрын
@@AICoffeeBreak that would be the only live stream that I would pay money to watch haha
@aflah75723 жыл бұрын
Awesome! Thanks for this great video
@AICoffeeBreak3 жыл бұрын
Our pleasure! Thanks for watching. 👍
@abail70103 жыл бұрын
Thank you very much for this easy explanation of the complex topic! :)
@Adarsh_R3 жыл бұрын
Very good explanation
@AICoffeeBreak3 жыл бұрын
Thanks for appreciating. It is one of my earliest videos. 😅 I should do a remastered version of this.
@Adarsh_R3 жыл бұрын
@@AICoffeeBreak Great keep going . I am newbie to AI world and not a computer science person. Your videos are really awesome and easy to understand 😊
@prachigupta46103 жыл бұрын
Very nice explanation, easy to follow even for someone with almost no background in NLP
@AICoffeeBreak3 жыл бұрын
Thanks, that was exactly the point! 😃
@ababoo992 жыл бұрын
Great video. I really Like how you put the aspects of transformers in practical context.
@AICoffeeBreak2 жыл бұрын
Thank you very much!
@saminchowdhury79952 жыл бұрын
Such an incredible explanation. Are you using Manim? its so smooth
@AICoffeeBreak2 жыл бұрын
Thanks! No manim, just PowerPoint's morph functionality. 😅
@AICoffeeBreak2 жыл бұрын
Glad to see you around, hope you find something worth wasting your time. 👍
@ayehninnkhine44834 жыл бұрын
Your explanation is very clear and precise. It helps me a lot. Thank you so much.
@tejasdhagawkar14684 жыл бұрын
Best explanation everrrr.. Thank you soo much.
@AICoffeeBreak4 жыл бұрын
You're very welcome! Glad it helped!
@lookas2995 Жыл бұрын
Easy to understand and good for beginers. Thanks for your high quality video :P
@AICoffeeBreak Жыл бұрын
Thanks for passing by and leaving this nice comment! ☺️
@gregd60222 жыл бұрын
Holy Smokes... you're videos are clear and entertaining.. wow, other could learn from you. The "pseudo" bossa nova at the end helps too, me having a Brasil connection ;))
@flossetti32462 жыл бұрын
Best explanation i've seen thank you. Subbed.
@TenzinT2 жыл бұрын
Very good explanation. Thanks!
@dinasaif96103 жыл бұрын
Thanks soooooooooooo much
@AICoffeeBreak3 жыл бұрын
Welcome 😊
@nerisozen90293 жыл бұрын
Thanks! Best explanation on transformers I've seen so far, but I think I would have liked it if you dwelled more on the generation of the output and encoder-decoder attention. Cheers!
@AICoffeeBreak3 жыл бұрын
Great suggestion! I am considering to make Transformer Explained Remastered version. But I did not find the time yet...
@nerisozen90293 жыл бұрын
@@AICoffeeBreak Thank you for the response! Best of luck
@thiswasme54523 жыл бұрын
Thank you mam for this beautiful explanation! Subscribed
@LowestofheDead2 жыл бұрын
Subscribed because this channel keeps giving top quality explanations. Also I just realised the eyes on the coffee bean are shaped like a coffee bean 🤣
@AICoffeeBreak2 жыл бұрын
What? I did not realize the CB eyes are shaped like beans. They are just circles. 🤣
@mak98563 жыл бұрын
Excellent explanation ! Thank You
@AICoffeeBreak3 жыл бұрын
Ms. Coffee Bean is so glad it was helpful!
@dipankarrahuldey62493 жыл бұрын
Please keep up this amazing work, clearly explained. And a request, if possible please add your all relevant videos in a playlist, so that it can be followed easily(I found a playlist though). Thanks a ton
@AICoffeeBreak3 жыл бұрын
Hey, thanks for the suggestion! Do I understand correctly that the existing playlists are not enough? :)
@dipankarrahuldey62493 жыл бұрын
@@AICoffeeBreak No just wanted to say whenever you'll add a new video, please try to add it to the playlist. You've playlist has got great contents. Thanks again
@AICoffeeBreak3 жыл бұрын
Thanks for clarifying! I usually add things to the playlists if they belong together.
@mikemihay4 жыл бұрын
Excellent video!
@ManikandtanCK3 жыл бұрын
I am not sure if you already made a video on this, but I would love to see you taking two architecture paradigms like CNNs and Transformers and explain the differences, the reason why one works better than the other in what situations etc.
@AICoffeeBreak3 жыл бұрын
Is this what you mean? kzbin.info/www/bejne/l3mapGmnjaqImcU Or this? kzbin.info/www/bejne/eofSeamjrNxlorM
@KeinNiemand2 жыл бұрын
I wonder what the next big neural network architecture is going to be, what comes next after transformers?
@AICoffeeBreak7 ай бұрын
Maybe RNNs such as LSTMs and State Space Models (we did an explainer here: kzbin.info/www/bejne/rKOpZICqfNx3Zrs )
@user-or7ji5hv8y4 жыл бұрын
The high level is really helpful but I think a video that explains the details with some intuition could be really useful as well.
@AICoffeeBreak4 жыл бұрын
Noted. 😀
@bertchristiaens63554 жыл бұрын
Very nice explanation :p
@AICoffeeBreak4 жыл бұрын
Glad you think so!
@kypy39004 жыл бұрын
In sfarsit inteleg si eu arhitectura asta :D. Multumesc.
@AICoffeeBreak4 жыл бұрын
Mă bucur că a ajutat pe cineva acest video!
@goelnikhils Жыл бұрын
Hi Letitia , Thanks for such great content. One question I have - When we use Transformers Encoder to encode any sequence to generate embeddings what loss function does transformer uses. For e.g. I am using Transformer Encoder to encode a sequence of user actions in a user session to generate embeddings to be used in my recommender system. Kindly answer
@AICoffeeBreak Жыл бұрын
Hi! To predict the correct word, from all possible word over the (English) vocabulary, the model uses cross-entropy loss. But you can change the loss to adapt it to your problem.
@GradientDude4 жыл бұрын
Thanks Letitia! Very nice animations. I wonder which software do you use to draw them?
@AICoffeeBreak4 жыл бұрын
For the graphics and content animations I use Powerpoint, for animating Ms. Coffee Bean and video editing I use kdenlive, for drawing her I use Adobe Photoshop.
@GradientDude4 жыл бұрын
@@AICoffeeBreak impressive! Do you film the Powerpoint presentation with OBS?
@AICoffeeBreak4 жыл бұрын
@@GradientDude yes, OBS it is! Forgot to mention.
@liammcfadden77604 жыл бұрын
What is the song at the end? I like your end musics...
@AICoffeeBreak4 жыл бұрын
Especially this end music is one of my favorites too! It is called "Malandragem" and you can find it in the KZbin Audio Library (under the KZbin Audio Library license).
@fashion22842 жыл бұрын
niiiiiiiiice love you
@zebcode Жыл бұрын
Gow can all the word embeddings are processed in parallel? Each sentence is a different length?
@ChengyiLi-t1k4 ай бұрын
This was very well explained. Thank you so much for this. I couldn't help but wonder since these models are trained by data, how much data do you need for a model to be reasonably accurate? A quick google search told me that GPT4.0 is trained by approximately 500,000,000,000 pages of text, which is absolutely insane to me! I want to know if there are models that we can develop that train based on less data but still provide accurate results and what do these models look like?
@AICoffeeBreak4 ай бұрын
Thanks a lot, especially since this is a very old video. We have made a new transformer explainer: kzbin.info/www/bejne/m5SceoSDnq91ntU About your question: Unfortunately in deep learning, great performance comes with big data, because the models work only well in the domains and kind of data they have seen so far (in distribution). And the motto is: nothing will be out of distribution if we include the entire world in the training data, no? (this is a tongue-in-cheek comment, just flagging it). 😅 So, if you are willing to sacrifice a lot of performance, then there are models that can work with less data, going back to older NLP, based on word embeddings and tf-idf representations. But I cannot say more until I know your specific use case. But if you want a chatbot that can talk about almost anything, then you need trillions of tokens of text, at least this is what we learned from ChatGPT et al.
@ChengyiLi-t1k4 ай бұрын
Oh wow I didn't even realize I was on the older video I will definitely check out the new one, and thanks for your answer! The motivation for my question was since we typically don't have a lot of data on endangered languages, could there be language models that can produce helpful results in these languages despite the lack of data on them. I guess the broader question would be what kinds of language models could we apply to endangered languages for things such as documenting them or aiding in that kind of research?
@AICoffeeBreak4 ай бұрын
@@ChengyiLi-t1k I'm not an expert in multilingual AI, but I have heard from experts there. Your question reminds me of two points. * In multilingual AI, people still try to scrape all the monolingual data they have, automatically produce back translations and then train a multilingual model that hopefully can transfer its knowledge from high resource languages, to the low resource one. But you need some decent amount of data from every language you aim to learn. We've made a video on this approach, find the link in the description. kzbin.info/www/bejne/Z5irhpyEgb6UaJI * If you have a very powerful model, of the class of GPT-4, Gemini, et al, then you could hope that the representations that exist are strong enough to elicit them with few-shot prompting. So, if you have the context length of Gemini, of multiple million input tokens, then you can many-shot a language from scratch by feeding in its dictionary and a grammar book. This is what Gemini 1.5 did for Kalamang: www.reddit.com/r/singularity/comments/1arla9z/gemini_15_pro_can_learn_to_zero_shot_translate_of/ It was meant as a out-of-distribution test, because the authors were sure that there is no trace of Kalamang on the internet that Gemini was trained on.
@ChengyiLi-t1k4 ай бұрын
@@AICoffeeBreak Thank you very much! I appreciate the response.
@osuregraz3 жыл бұрын
Neat explanation however after watching video I still don’t understand how multihead attention and encoder-decoder attention work which are two important concepts I need to know
@AICoffeeBreak3 жыл бұрын
You're right. We plan a "Transformer remastered" video because sure, we did not get to the most technical points and to the math formulas. You might want to check out Yannic's video on it: He stays very close to the paper. kzbin.info/www/bejne/n3XYnZulhpejqNE
@bleacherz7503 Жыл бұрын
Why are word vectors of length 512?
@AICoffeeBreak Жыл бұрын
That's their length in BERT base. They can be longer depending on how much memory available one has and how long one is willing to wait for a pass through the network. The longer the vectors, the slower the transformer.
@zbynekba2 жыл бұрын
Hi Letitia, great explanation, thank you. At 08:14 you have mentioned the Encoder-decoder attention. That’s one of the key locations where magic happens. Could you elaborate on it? Or maybe you you have done already in some other of your videos?
@mak98563 жыл бұрын
Hi , i still dont understand what exactly are the; query, Keys and values in the paper: attention is all you need, can u help?
@AICoffeeBreak3 жыл бұрын
Ms. Coffee Bean is planning a follow-up video in Transformer going into more details. Right now the concepts you ask about were out of the scope of this explanation. Stay tuned!
@chainonsmanquants16302 жыл бұрын
best video i found on transformers. i still don't know what are the query key and values in the decoder parr
@AICoffeeBreak7 ай бұрын
Maybe our remastered version of this video might help kzbin.info/www/bejne/m5SceoSDnq91ntU . We did it mainly because of your feedback, thanks a lot!
@PasseScience2 жыл бұрын
Hi, thanks for the video! There are several things that are still unclear to me. First I do not understand well how the architecture is dynamic with respect to the size of the input. I mean what does change structurally when we change the size of the input, are there some inner parts that should be parallely repeated? or does this architecture fix a size of max window that we hope will be larger than any sequence input? The other question is the most important one, it seems every explanation of transformer architecture I have found so far focuses one what we WANT a self attention or attention layer to do but never say a word of WHY after training those attention layers will do, by emergence, what we expect them to do. I guess it has something to do with the chosen structure of data in input and output of those layers, as well as the data flow which is forced but I do not have yet the revelation. If you could help me with those, that would be great!
@v1hana3502 жыл бұрын
What is the meaning of fine-tuning and Pre-trained in Transformers?
@СергейЛис-с7т2 жыл бұрын
video anda sangat bagus dan mempunyai mesej yang luas terima kasih
@hamedghazikhani97753 жыл бұрын
Thank you, it was very good. I hope you can improve your accent and speak faster a bit. Again thanks.
@d3v4873 жыл бұрын
Amazing love it 💗. Very intuitive Explanation. Please speak slow :)
@AICoffeeBreak3 жыл бұрын
Thank you, I will try my best concerning the verbal pace! 😊 Does turning the captions on and setting the speed of the video to 0.75x maybe help you? The captions could help a lot, I have uploaded them myself => they are not automatically generated. Even if they were, the Algorithm got pretty good at speech-to-text.