What are Transformer Models and how do they work?

  Рет қаралды 89,511

Serrano.Academy

Serrano.Academy

Күн бұрын

This is the last of a series of 3 videos where we demystify Transformer models and explain them with visuals and friendly examples.
Video 1: The attention mechanism in high level • The Attention Mechanis...
Video 2: The attention mechanism with math • The math behind Attent...
Video 3 (This one): Transformer models
If you like this material, check out LLM University from Cohere!
llm.university
Get the Grokking Machine Learning book!
manning.com/books/grokking-ma...
Discount code (40%): serranoyt
(Use the discount code on checkout)
00:00 Introduction
01:50 What is a transformer?
04:35 Generating one word at a time
08:59 Sentiment Analysis
13:05 Neural Networks
18:18 Tokenization
19:12 Embeddings
25:06 Positional encoding
27:54 Attention
32:29 Softmax
35:48 Architecture of a Transformer
39:00 Fine-tuning
42:20 Conclusion

Пікірлер: 137
@zafersahinoglu5913
@zafersahinoglu5913 4 ай бұрын
Luis Serrano, this set of 3 videos to explain how LLMs and transformers works is truly the best explanation available. Appreciate your contribution to the literature.
@luizabarguil5214
@luizabarguil5214 2 күн бұрын
A clear, concise and conclusive way of explaining Transformers! Congrats and Thank you so much for sharing it!
@anipacify1163
@anipacify1163 2 ай бұрын
Best playlist on transformers and attention. Period . There is nothing better on KZbin. Goated playlist. Thank you soo much !
@YannickBurky
@YannickBurky 2 ай бұрын
This video is incredible. I've been looking for material to help me understand this mess for quite some time now, but everything about this video is perfect: the tone, the speed of speech, the explanations, the hierarchy of knowledge... I'm screaming with joy!
@lightninghell4
@lightninghell4 4 ай бұрын
I've seen many videos on transformers but this series os the first where I understood the topic at a deep enough level to appreciate it.
@sandhiyar8763
@sandhiyar8763 5 ай бұрын
Absolutely worth the watch! The clarity in Luis's explanation truly reflects his solid grasp on the content.👍
@RuliManurung
@RuliManurung 27 күн бұрын
What an awesome series of lectures. I spent 10 years teaching undergraduate-level Artificial Intelligence and NLP courses, so I can really appreciate the skill in breaking down and demystifying these concepts. Great job! I would say the only thing missing from these videos is that you don't really cover how the learning/training process works in detail, but presumably that would detract from the focus of these videos, and you cover it elsewhere.
@alidanish6303
@alidanish6303 6 ай бұрын
Finally the 3rd video of the series and as usual with the same clarity of concepts as expected from Serrano. The way you have perceived these esoteric concepts have produced pure gold. I am following you and jay from Udacity and you guys have made real contribution in explaining a lot of black magic. Any plans to update grokking series...?
@MohamedHassan-pv1xl
@MohamedHassan-pv1xl 4 ай бұрын
I feel super lucky to come across your videos, normally I don't comment, but I saw the 3 videos of the series and I'm amazed on how you explain complicated topics. You're efforts are highly appreciated.
@jazznomad
@jazznomad 6 ай бұрын
Very clear. You are a natural teacher
@amiralioghli8622
@amiralioghli8622 5 ай бұрын
There is a plethora of videos available on the theoretical aspects of transformers, but there is a noticeable scarcity of content when it comes to their practical implementation. Furthermore, there is a notable absence of videos demonstrating how to implement transformers specifically for time series data. In light of this, it would be highly appreciated if you could devote some attention to the practical implementation of transformers, with a particular emphasis on their application to time series data.
@william_8844
@william_8844 5 ай бұрын
I concur, I would really love to watch the vid
@calebadobah641
@calebadobah641 20 күн бұрын
Yhh don't know why people don't really post on such stuff
@usefbob
@usefbob 4 ай бұрын
This series was great! Appreciate all the time and effort you've put into them, and laid out the concepts so clearly 🙏🙏
@patriciasilvaoliveira6130
@patriciasilvaoliveira6130 8 күн бұрын
Amazingly clear and encouraging to learn more. Thanks, maestro!
@johnschut164
@johnschut164 4 ай бұрын
Your explanations are truly great! You have even understood that you sometimes have to ‘lie’ first to be able to explain things better. My sincere compliments! 👊
@karunamudliyar5625
@karunamudliyar5625 5 ай бұрын
The best video, that I watched on Transformer. Very clear explanation
@Tuscani2005GT
@Tuscani2005GT 3 ай бұрын
Seriously some of the best videos on the topic. Thank you!
@SkyRiderJavelin
@SkyRiderJavelin 4 ай бұрын
What an excellent series on Transformers, really did the trick !!! The penny has finally dropped. Thanks very much for posting this is very useful content. I wish I came across this channel before spending 8 hours doing a course and still not understanding what happens under the hood.
@shaktisd
@shaktisd 4 ай бұрын
Amazing series on Transformer. Never ever imagined the true rationale behind Q,K,V ... it is actually clear after watching your video. Thanks a lot.
@utkarshkapil
@utkarshkapil 4 ай бұрын
Beautifully explained. Loved how you went ahead to also teach a bit of the pre-requisites!
@jorovifi89
@jorovifi89 6 ай бұрын
Great work as always, thank you keep them coming
@nazmulhaque8533
@nazmulhaque8533 6 ай бұрын
Excellent presentation. Waiting to see more videos like this. I would request you to make a series about aspect based sentiment analysis. Best wishes...
@faraazmohammed3693
@faraazmohammed3693 3 ай бұрын
Liked the video while watching at 7:15. crystal clear explanation. Good job, thank you Serrano and I appreciate your work.
@blueberryml
@blueberryml 5 ай бұрын
excellent -- clear and concise explanations
@nileshkikle8112
@nileshkikle8112 3 ай бұрын
Dr. Luis - Thank you for taking all the effort for creating these #3 videos. Explaining complex things in the simplest way is an art! And you have that knack! Great job! Been following you ML videos for years now and I always enjoy them. PS- Funny enough, for typing these comments, I'm being prompted for selecting the next prediction word! 🙂
@timothyjoubert8543
@timothyjoubert8543 3 ай бұрын
thank you for this series - wonderfully explained. 💯
@johnny1966m
@johnny1966m 4 ай бұрын
Thank you Mr. Serrano, it was very educated and lectured in very good way. In relation to Positioning, the example with arrows gave me the idea that the purpose of this stage is to make sure that only the correct positions of words in the sentence cluster together, while the incorrect ones diverge them, thus the neural network distinguishes their position in the sentence during training.
@poussinet2
@poussinet2 4 ай бұрын
Thank you for these really high quality videos and explanations.
@aminemharzi7222
@aminemharzi7222 5 күн бұрын
The best explanation I found
@harithummaluru3343
@harithummaluru3343 3 ай бұрын
great explanation. perhaps one of the best videos
@AboutOliver
@AboutOliver 4 ай бұрын
You have a skill for teaching! Thanks so much for this series.
@dragolov
@dragolov 4 ай бұрын
Deep respect, Luis Serrano! Thank you so much!
@AmanBansil
@AmanBansil 19 күн бұрын
Incredible - just found this channel and I am about to pour over all videos. Thank you so much for your effort.
@chrisogonas
@chrisogonas 4 ай бұрын
Well illustrated! Thanks for sharing.
@GiovanneAfonso
@GiovanneAfonso Ай бұрын
The best explanation available
@danieltiema
@danieltiema Ай бұрын
Thank you for explaining this so well.
@abdelrhmanshoeeb7159
@abdelrhmanshoeeb7159 5 ай бұрын
Finally ,i am waiting it from a month. Thank you alot.
@markuskaukonen3903
@markuskaukonen3903 4 ай бұрын
Very nice stuff!, This first time somebody explained clearly what large language models are. Especially the second video was very valuable for me!
@anuragdh
@anuragdh 3 ай бұрын
Thanks.. for the first time (not that I've gone through a lot of them :)), I was able to appreciate how the different layers of a Neural Network fit together with their weights. Thanks for making this video with the example used
@sohamlakhote9822
@sohamlakhote9822 10 күн бұрын
Thanks a lot man!!! You did a fantastic job explaining these concepts 🙂
@silvera1109
@silvera1109 Ай бұрын
Great video, hugely appreciated, thank you Luis! 🙏
@Glomly
@Glomly 3 ай бұрын
This is the BEST explanation ever you can find on the internet. I'm serious
@analagunapradas3840
@analagunapradas3840 2 ай бұрын
Agree, definitly the BEST, as always ;) Luis Serrano
@lakshminarayanan5486
@lakshminarayanan5486 4 ай бұрын
Hi Luis, Excellent material and you know how to deliver it to perfection. Thanks a lot. Could you please explain a bit more on positional encoding and how the residual connections & layer normalization, encoder-decoder components fit into the very same example.
@skbHerath
@skbHerath 2 ай бұрын
finally, i managed to understand the concept clearly, Thanks
@edwinma9933
@edwinma9933 5 ай бұрын
this is amazing, it deserves 10M views.
@incognito3k
@incognito3k 3 ай бұрын
As always amazing content. We need a book from Luis on GenAI!
@sathyanukala3409
@sathyanukala3409 2 ай бұрын
Excellent explanation. Thanks you.
@anupamjain345
@anupamjain345 24 күн бұрын
Thanks! Never came across anyone explaining anything in such a great detail, you are amazing !!!
@SerranoAcademy
@SerranoAcademy 23 күн бұрын
@anupamjain345, thank you so much for your really kind contribution, and for your nice words!
@leenabhandari5949
@leenabhandari5949 6 күн бұрын
Your videos are gold.
@vasimshaikh9857
@vasimshaikh9857 6 ай бұрын
Finally 3rd video is here 😮😅 thank you sir I have been waiting for this video since last month , everyday I check your channel for the 3rd video sir , thank you so much sir You're doing great work 👍
@ColinTimmins
@ColinTimmins 5 ай бұрын
Very nice! Love your work and visuals. =]
@samirelzein1095
@samirelzein1095 5 ай бұрын
the great Luis! i am recommending you in my job posts, your content is a prerequisite before working for us
@SerranoAcademy
@SerranoAcademy 5 ай бұрын
Wow thanks Samir, what an honor! And great to hear from you! I hope all is well on your end!
@samirelzein1095
@samirelzein1095 5 ай бұрын
@@SerranoAcademy honor is mine! you are the artist of going inside it, seeing the wiring and connections and delivering them as seen to all people. that s the job of prophets and saints. bless you. i am doing great, plenty of text and image processing currently :) digitizing the undigitized!
@sriharsha580
@sriharsha580 5 ай бұрын
Thanks for the wonderful presentation. In the previous video of the series while discussing the relationship between query and key(building the context relationship between words), it is mentioned the relationship QK and v(predicting next word) will be covered in this video, may I know whether it will be covered in another video or not.
@panoskolyvakis4075
@panoskolyvakis4075 Ай бұрын
you re an absolute legend
@Omar-bi9zn
@Omar-bi9zn 2 ай бұрын
fantastic videos !!
@jukebox419
@jukebox419 6 ай бұрын
You're the greatest teacher ever lived in the history of mankind. Can you please do more videos regularly?
@SerranoAcademy
@SerranoAcademy 6 ай бұрын
Thank you so much! Yes I'm definitely working hard at it. Some videos take me quite a while to make, but I really enjoy the process. :) If you have suggestions for topics, please let me know!
@andreibuldakov2641
@andreibuldakov2641 5 ай бұрын
Man, you are the best!
@ikheiri
@ikheiri 4 ай бұрын
Best video i've come across that explains concepts simply. Helped tremendously in my learning endeavor to create a mental model for neural networks (there's a joke there somewhere)
@SerranoAcademy
@SerranoAcademy 4 ай бұрын
Thanks! Lol, I see what you did there! :)
@joaomontenegro
@joaomontenegro 4 ай бұрын
These videos are great!! I would love to see one about the intuition of cross attention in, for example, the context of translation between two languages.
@SerranoAcademy
@SerranoAcademy 4 ай бұрын
Thanks, great suggestion!
@maethu
@maethu 4 ай бұрын
I am happy like a zygote about this video! Great work, thanks a lot!
@EugenBurianov
@EugenBurianov Ай бұрын
Great great video! Thank you
@BasmaSAYAH-vp7vo
@BasmaSAYAH-vp7vo 3 ай бұрын
Thank you 🙏🏻
@william_8844
@william_8844 5 ай бұрын
I like the attention explanation
@emvdl
@emvdl 5 ай бұрын
well done 👍
@htchtc203
@htchtc203 3 ай бұрын
Sir, thank you for very clear and informative series of presentations. Excellent job? May I ask something about embedding or word2vectors. How is a NN trained for words in order to cluster words for some kind of similarity grouos in multidimensional vector space? Is this training proceas guided or is it like self organizing map or process?
@wanggogo1979
@wanggogo1979 5 ай бұрын
Finally, I waited until this video was released.
@karstenhannes9628
@karstenhannes9628 Ай бұрын
Thanks for the video! I particularly liked the previous video about attention, super nice explanation! However, I thought most transformers simply use a linear layer that is also trained to create the embedding instead of using a pre-trained network like word2vec.
@dayobanjo3870
@dayobanjo3870 2 ай бұрын
Great video, speaking from Abuja capital of Nigeria
@SerranoAcademy
@SerranoAcademy Ай бұрын
ohhhh greetings to Abuja!!! Nigerians are the kindest people, I hope to visit sometime!
@craigwood6561
@craigwood6561 2 ай бұрын
Amazing
@alexanderzikal7244
@alexanderzikal7244 18 күн бұрын
Thank You very much for all your videos! Whats Software do You use for your presentations? All looks really nice, all pictures…
@aspaksharif4376
@aspaksharif4376 Ай бұрын
Thank you..
@jayanthkothapalli9.2
@jayanthkothapalli9.2 Ай бұрын
Great Videos sir, Thank you for helping us to increase India's GDP!! Sir can you make videos on Fine-Tuning?
@TemporaryForstudy
@TemporaryForstudy 6 ай бұрын
Your videos are rocking as always. Hey, do you have any remote internship opportunities in your team or in your organisation? I would love to learn and work with you guys.
@SerranoAcademy
@SerranoAcademy 6 ай бұрын
Thank you so much! Yes we have internships, check them out here! jobs.lever.co/cohere
@vankram1552
@vankram1552 4 ай бұрын
This is a fantasitc video, by far the best on youtube. My only feedback would be the guitar music you use between chapters is a little abrasive and can you take you out of the learning process. Maybe some calmer more thought provocing music along with more interesting title cards would be better.
@Aleks-ng3pp
@Aleks-ng3pp 3 ай бұрын
In the previous video you said you would explain how to compute Q, K, and V matrices in this one. But I don't see.
@ifeanyiidiaye1889
@ifeanyiidiaye1889 6 ай бұрын
Very awesome video. Thanks a lot! However, can you please suggest the Q and A dataset format for finetuning an LLM to answer questions? Is there a very specific format, especially if you are using domain-specific dataset or can a regular CSV file with columns ["Questions", "Answers"] be used for that purpose? Please, I will appreciate any advice or recommendations you make 🙏
@SerranoAcademy
@SerranoAcademy 6 ай бұрын
Thank you so much! That's a great question! Normally companies that train LLMs curate their own datasets, and I'm not sure exactly how they look. But here is one built publicly, that looks pretty good! paperswithcode.com/paper/toolqa-a-dataset-for-llm-question-answering
@ifeanyiidiaye1889
@ifeanyiidiaye1889 6 ай бұрын
@@SerranoAcademy Thanks a lot
@ohhbatu1303
@ohhbatu1303 4 ай бұрын
good video
@Nerdimo
@Nerdimo Ай бұрын
8:53 do you think it’s a plain feed forward neural network, or something like an RNN, LSTM to be specific? Just a thought.
@maneeshajainz
@maneeshajainz 4 ай бұрын
I like your videos. Can you post a quiz after each of your videos?
@arsalanzabeeb6467
@arsalanzabeeb6467 5 ай бұрын
and this 3rd video summed up so nicely that i am just in a denial ........is that it ? Thank you so much
@user-mh1ip4er7x
@user-mh1ip4er7x 4 ай бұрын
Thanks!
@SerranoAcademy
@SerranoAcademy 4 ай бұрын
Thank you so much for your kindness! Very appreciated. :)
@atriplehero
@atriplehero 4 ай бұрын
Does the whole "Once upon a time" already built gets fed again into the whole process again as a 'input' in order to get attached its next word/token? In order words, is it like a cycling again and again untill a "seemengly complete" answer is generated? If this is the case, it would be a whole lot of inefficiency and explains why so much electricity is consumed!! Please answer this crucial detail.
@techgayi
@techgayi 4 ай бұрын
Excellent video, will be good to indicate what encoder - decoder model is in transformers. Couldnt figure that out here.
@SerranoAcademy
@SerranoAcademy 4 ай бұрын
Thanks! Yes that's something I'm trying to make sense of, perhaps in a future video. In the meantime, this blog post is the best place to go for that: jalammar.github.io/illustrated-transformer/
@CreatingUtopia
@CreatingUtopia 4 ай бұрын
Thanx
@AI-Kawser
@AI-Kawser 5 ай бұрын
awesome
@rasmusnordstrom9947
@rasmusnordstrom9947 4 ай бұрын
Is there any good explanation out there for the "second" input into the transformer structure: Outputs (shifted right) in the original paper?
@SerranoAcademy
@SerranoAcademy 4 ай бұрын
Thanks for the question! Not sure exactly what second input. The one coming out of the attention mechanism and into the transformer? I would say that that's a 'enhanced' vector for the input text. Namely, one that carries context on it. Lemme know if that's what you meant, or if it was a different one.
@rasmusnordstrom9947
@rasmusnordstrom9947 4 ай бұрын
​@@SerranoAcademy If I understand correctly when looking at Figure 1 in the original paper (attention is all you need), the initial prompt is first fed into the encoder and then inserted halfway into the decoder, which finally yields the first token. As far as I understand, to generate the next token, we don't simply append the first token to the initial prompt and run it through both the encoder and decoder. Instead, we insert the newly generated token directly into the decoder (or runs it through a different encoder?). I'm somewhat confused about this part.
@SatyaRao-fh4ny
@SatyaRao-fh4ny 4 ай бұрын
This is a great video, clarifying a number of concepts. However, I am still not finding any answer to some of my questions. e.g in this video, when the user enters "Write a story.", these are 4 tokens. But the "model" spits out a NEW word "Once". Where is this NEW word coming from? How does the "model" even "KNOW" about such a word? Is it saved in some database/file? Is there a dictionary of ALL the words (or tokens) that the "model" has access to? And I guess the other question what does "training a model" actually mean- on the ground- not just conceptually? After training, is the end result some data/words/tokens/embeddings that are save in some file that the "model" "reads/processes" when it is used later on? What are parameters? I have watched several hours of videos, but have not found answers to these questions! Thanks for any help for experts!
@SerranoAcademy
@SerranoAcademy 4 ай бұрын
Thanks, great questions! Yes, there is a database of tokens, and what the model does is output a list of probabilities, for each token. The ones with high probability are the ones that are very likely to be the next in the sentence. So then one can pick a token at random based on this probabilities, and very likely you'll pick one that has a high probability (and that way, the model will not always answer the questions in the exact same way, but it'll have variety). The training part is very similar to a neural network. It consists on updating the weights so that the model does a better job. So for example, if the next word in a sentence should be "apple", and the model gives "apple" a very low probability, then the backpropagation process updates the weights so that the probability of "apple" increases, and all the other ones decrease. The parameters are the parameters of the neural network + the parameters of the attention matrices. If you'd like to learn more about neural networks and the training process, check out this video: kzbin.info/www/bejne/eIOcmWdtf9mkr9k
@romanemul1
@romanemul1 6 ай бұрын
Thank you Louis G. Any future courses in plan with Udacity ?
@SerranoAcademy
@SerranoAcademy 6 ай бұрын
Thank you, glad you liked it! Nothing with Udacity recently, but I did make this one with Coursera: www.deeplearning.ai/short-courses/large-language-models-semantic-search/
@romanemul1
@romanemul1 5 ай бұрын
@@SerranoAcademyThanks . Enrolled
@laavispamaya
@laavispamaya 5 ай бұрын
Ooh thanxs for mentioning Emmy Noether
@SerranoAcademy
@SerranoAcademy 5 ай бұрын
Yayy!!! Huge fan of Emmy! :)
@udaya2008
@udaya2008 3 ай бұрын
One confusion, I had was to comprehend the seperation of model development vs the scoring/inferencing phases from your explanation. I am understanding that your explaining lot of inference parts and some development parts.
@nafisanawrin2901
@nafisanawrin2901 Ай бұрын
While training model if it shows wrong answer how it is corrected?
@mrcool4172
@mrcool4172 3 ай бұрын
Very well explained !! I want to understand if a trained transformer model can be further trained with a curated dataset to specialise in say chatbot?
@SerranoAcademy
@SerranoAcademy 3 ай бұрын
Thank you! Absolutely, what you would do in that case is fine tuning. That is, you take a trained model, and then post-train it with the data that you want, then it becomes better at answering from that dataset.
@khaledbouzaiene3959
@khaledbouzaiene3959 6 ай бұрын
nice wow but please i still have a question, you didn’t mentioned how the words with similarities are placed close in embedding, i know after we assign the mechanism attention score but don’t get do the embedding is a separate neural network as in video
@tantzer6113
@tantzer6113 6 ай бұрын
That closeness is achieved automatically in the end result because it’s more efficient. It isn’t something that the human designer plans for.
@SerranoAcademy
@SerranoAcademy 6 ай бұрын
Yes, great question! The idea is to train a neural network to learn the neighboring words to a particular word. So in principle, words with similar neighbors will be close in the embedding, because the neural network sees them similarly. Then the embedding comes from looking at the penultimate layer in the neural network, which has a pretty good description of the words. So for example, the word 'apple' and the word 'pear' have similar neighboring words, so the neural network would output similar things. Therefore, at the penultimate layer, we'd imagine that the neural network must be carrying similar numbers for each of the words. The embeddings come out of here, so that's why the embeddings for 'apple' and 'pear' would be similar.
@khaledbouzaiene3959
@khaledbouzaiene3959 6 ай бұрын
@@SerranoAcademythanks for clarifying i got confused coz i just want this huge neural network composed of multiple layers of smaller neural networks where the first one is the embedding layer not separate one , but generally everything now make sense now no matter the design
@VerdonTrigance
@VerdonTrigance 3 ай бұрын
24:59 - again... who and how defines these layers and network to set words vectors? All it comes to it. How do we know that cherry and apple has a similar 'properties' ?
@user-sm1re8xm5p
@user-sm1re8xm5p 5 ай бұрын
I'm a bit confused : to create (train) a neural net you need to give it input : embeddings. and then whatever the penultimate layer shows is the embedding. isn't that pulling yourself out of the swamp by your own bootlaces ?
@SerranoAcademy
@SerranoAcademy 5 ай бұрын
Great question! Yes, it does look like a cycle the way it's shown, but these are trained at different times. Normally the embedding gets trained first, with a neural network that is trained to learn neighboring words. This is not a transformer, so it doesn't have attention layers. This NN is not good at generating text, but it gives a strong embedding. Once you have the embedding, then you train the transformer to generate sentences, with attention and all the other parts.
@user-sm1re8xm5p
@user-sm1re8xm5p 5 ай бұрын
@@SerranoAcademy Clear ! Thanks !
@anthonymalkoun6188
@anthonymalkoun6188 4 ай бұрын
What is the capital of Lebanon? Beirut! Cheers from Beirut, and thanks for the great video series :)
@SerranoAcademy
@SerranoAcademy 4 ай бұрын
Thank you!!! Lots of greetings to Beirut!!! :)
@hansu7474
@hansu7474 3 ай бұрын
40:55 Who discovered abstract algebra? Galois
@tantzer6113
@tantzer6113 6 ай бұрын
Given that the positional encoding is added to the word embeddings, how does the transformer learn to separate the combined positional and embedding signals?
@SerranoAcademy
@SerranoAcademy 6 ай бұрын
Great question! That's a bit of a mystery. All that is clear to me is that positional encoding changes each word based on the ordering. The model is trained to learn these small disruptions in order to pick up the order of the words. How it combines it exactly with the embeddings is not very clear (at least to me), but positional encoding has worked well in practice, especially with lots of data and very large models.
@tantzer6113
@tantzer6113 5 ай бұрын
Thank you. It occurred to me that the question I asked, “how does it do it?” is the wrong one. The question should be, why might one expect it to be *possible* to do? The creators of the positional embedding must have had a rationale or intuition for the possibility. This is suggested by the specificity of the form the positional encoding takes.
@genkidama7385
@genkidama7385 Ай бұрын
when you give a command most of the time it says "too complicated" and replaces code with comments.
@aidanthompson5053
@aidanthompson5053 5 ай бұрын
31:08
@vil9386
@vil9386 3 ай бұрын
Wow, the three videos you have put in this playlist - kzbin.info/www/bejne/hammoYqteah3fLM - makes it possible to understand Attention, otherwise looked impossible. Thank you!
The Attention Mechanism in Large Language Models
21:02
Serrano.Academy
Рет қаралды 72 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 154 М.
请善待你的娃娃第二集 #naruto  #cosplay  #shorts
00:52
佐助与鸣人
Рет қаралды 24 МЛН
Парковка Пошла Не По Плану 😨
00:12
Глеб Рандалайнен
Рет қаралды 13 МЛН
Сын Расстроился Из-за Новой Стрижки Папы 😂
00:21
Глеб Рандалайнен
Рет қаралды 1,7 МЛН
Which one will take more 😉
00:27
Polar
Рет қаралды 38 МЛН
The math behind Attention: Keys, Queries, and Values matrices
36:16
Serrano.Academy
Рет қаралды 183 М.
Transformers explained | The architecture behind LLMs
19:48
AI Coffee Break with Letitia
Рет қаралды 15 М.
Vision Transformer Basics
30:49
Samuel Albanie
Рет қаралды 14 М.
ChatGPT: 30 Year History | How AI Learned to Talk
26:55
Art of the Problem
Рет қаралды 927 М.
Attention Is All You Need - Paper Explained
36:44
Halfling Wizard
Рет қаралды 92 М.
MIT 6.S191 (2023): Recurrent Neural Networks, Transformers, and Attention
1:02:50
СЛОМАЛСЯ ПК ЗА 2000$🤬
0:59
Корнеич
Рет қаралды 1,5 МЛН
Broken Flex Repair #technology #mobilerepair
0:55
ideal institute aligarh
Рет қаралды 15 МЛН
ИГРОВОЙ ПК от DEXP за 37 тысяч рублей из DNS
27:53