This is the most Italian ML lecture I have seen, god bless your mom
@alfcnz Жыл бұрын
😮😮😮
@dustinvansteeus49043 жыл бұрын
Thanks Alfredo, amazing presentation on transformers. By far the clearest explanation I have seen on the internet. Kudos to you!
@alfcnz3 жыл бұрын
😃😃😃
@sk7w4tch3r3 ай бұрын
Thanks!
@alfcnz3 ай бұрын
🥳🥳🥳
@Vanadium40410 ай бұрын
Beautiful lecture by a beautiful person. Subscribed you for this amazing explanation accept my respect and lots for love from Pakistan 🇵🇰
@alfcnz10 ай бұрын
❤️❤️❤️
@alexvass2 жыл бұрын
Thanks
@alexvass2 жыл бұрын
could you please clarify what is the 'key' and the 'query' in the context of the language example?
@alfcnz2 жыл бұрын
You’re welcome! 🤓🤓🤓
@drahmsha Жыл бұрын
Thanks Alfredo for the amazing material and high quality videos, I really enjoyed them and literally they are my only guide in the field, my med school doesn't have any programs for AI in medicine and it will not have for a coming thousand year and people in my place will only get the chance for such courses and deep understanding from courses and people who share their knowledge online like this.. You are a life changer and a life saver.. thank you so much from your thousand miles sincere remote student
@alfcnz Жыл бұрын
🥰🥰🥰
@Farhad6th2 жыл бұрын
درود، آقا خیلی با جزئیات توضیح میدهی. دست شما درد نکند. ممنون. چون می دونم تا حدودی پارسی بلدی، پارسی نوشتم. :)
@alfcnz2 жыл бұрын
خوش اومدی 🥰🥰🥰
@dimitri304 ай бұрын
Thank you so much, it's amazing how easy you make this to understand.
@alfcnz4 ай бұрын
You're very welcome! 😀😀😀
@sur_yt8052 жыл бұрын
I have been struggling with concept of attention from so Long and never came across such heavenly explanation can not thank you ever. hope I can develop your level of expertise Regards Hina Ali
@alfcnz2 жыл бұрын
😀😀😀
@Cropinky Жыл бұрын
bro got me googling whats a pizza boscaiola while trying to learn attention
@alfcnz Жыл бұрын
🤣🤣🤣
@sidexside14392 жыл бұрын
I love your slides with dark background! It made my eyes more comfortable. Thanks!
@alfcnz9 ай бұрын
😇😇😇
@erikdali9833 жыл бұрын
Thanks for making these videos!
@alfcnz3 жыл бұрын
My pleasure!
@anadianBaconator3 жыл бұрын
honestly, thank you so much for the effort you take to explain the concepts! Really appreciate it
@alfcnz3 жыл бұрын
🥰🥰🥰
@goedel. Жыл бұрын
At 25:42, why do we have t distinct attention vectors if the argmax X^T x does not change? I might be misunderstanding the argmax.
@alfcnz Жыл бұрын
argmax returns a 1-hot vector, specifying where the maximum element is. 🙂
@goedel. Жыл бұрын
@@alfcnz just rewatched it and it makes sense. I was just confused by the notation, I read is as mapping the vector to it's attention scalar and somehow getting a R^{t x t} matrix. Thank you Alfredo 👍
@alfcnz Жыл бұрын
No worries 😊😊😊
@juanolano2818 Жыл бұрын
It would be great if you would explain this same topic today. I wonder how the explanation from today would differ from your explanation given 2 years ago. Do you think you'd explain it differently? what would you change?
@alfcnz Жыл бұрын
Hmm, the underlying principles are the same. Of course, there are new updates, but they are built on top of the foundations explained here.
@alirezajazini247610 ай бұрын
very good introduction
@alfcnz10 ай бұрын
Thank you 😊😊😊
@sidexside14392 жыл бұрын
1:01:25, the 'h' after the predictor. What is the truth of that h? (Hidden representation)
@me44472 жыл бұрын
This is such a good explanation!
@alfcnz2 жыл бұрын
❤️❤️❤️
@chani_m Жыл бұрын
Hi! I was wondering what does the beta mean in the formula at 22:17?
@alfcnz Жыл бұрын
β is the coldness coefficient of the softargmax, which for the vector 𝒗 is defined as exp(β𝒗)/∑ᵢ exp(β𝓋ᵢ)
@chani_m Жыл бұрын
@@alfcnz Thank you!
@mahdiamrollahi8456 Жыл бұрын
May I ask why we calculate the attention score between two vector (like K,Q) and apply this to another vecotor(like V) and then get output? I mean, could we perform it with two of them like K , V? att_score = softmax(k.v / d) att_output = att_score * v
@alfcnz Жыл бұрын
Think this way: V represents all KZbin videos, K is their title. q is the question you have in mind. KZbin will try to match q with every title of K. When found, the correct video is retrieved from V. 🤓🤓🤓
@mahdiamrollahi8456 Жыл бұрын
@@alfcnz Very good example✌ and It is ok when k and v are from different types(like here, title is text and the value is video | when in translation we have different languages for k(French) and v(English)). So, is that necessary to apply all three q, k ,v to texts that they come from the same language? In this case we, take the similarity between K and V (as attention score) and then apply the result again to V. I am just thinking about optimizing parameters in the case that we have data from the same type like English-to-English.
@zamiG74ever3 жыл бұрын
Hi alf, first I want to thank you for all your highly explanatory youtube and github content, it has really helped me in many DL occasions. I would like to ask what happens when d ≠ d' ≠ d''?
@alfcnz3 жыл бұрын
You're welcome 😀😀😀 Things would have a more meaningful size and you'd gain a degree of freedom.
@Batu1352 жыл бұрын
You should decouple weight decay with norm layers and embeddings ;)
@alfcnz2 жыл бұрын
???
@AdityaSanjivKanadeees2 жыл бұрын
Hi Alf, thanks for the great explanation !! A question, why do we set the bias=False, why do we not need an affine transformation of the input space, but just rotation? @1:04:00
@alfcnz Жыл бұрын
It's not strictly necessary. If you have a bias set to zero, then you force the information to be encoded in the orientation only. This makes sense, since later on we use the cosine similarity to measure agreement in terms of angles.
@MarioKostelac3 жыл бұрын
Thanks for this explanation Alfredo. When you say "rotation and squashing", what do you mean by that? My understanding is that linear transformations can do more than just rotate and squash (e.g. smear etc).
@alfcnz3 жыл бұрын
You need to watch all videos 😉 The answer to your question can be found in practica 1 and 2.
@MarioKostelac3 жыл бұрын
@@alfcnz Hah, watched it. It makes sense now :)
@alfcnz3 жыл бұрын
Good, good! On a side note, our PostDoc was googling “squashing” the other day 😅😅😅 I'd have never thought people would watch these out of order and therefore getting confused by my terminologie 😬😬😬
@MarioKostelac3 жыл бұрын
@@alfcnz Ah, I wanted to know more about transformed, but when you started talking about autoencoders, I figured I was missing more than just transformer explanation.
@mrigankanath73373 жыл бұрын
For the cross attention, what change should be done in the code?
@alfcnz3 жыл бұрын
Watch again the lecture, specifically the cross attention part.
@mrigankanath73373 жыл бұрын
@@alfcnz Hi Alf, I saw it again and also the code, I understood that the length should be same for the q,k,v but when giving input q can have different shape than v,k. So if I use a linear layer which output is d(same for q,k,v) but what is the input shape here? since differnt shapes. Should I declare Differnt Linear layer for each of them,?
@mrigankanath73373 жыл бұрын
No need to reply, I got it!!! , had to write all the dimensions in my notebook and then proceed, anyway thanks
@alfcnz3 жыл бұрын
@@mrigankanath7337 pay attention that q, k, v need *not* to be the same size. I fear I need to recommend watching it once more. 🥺🥺🥺
@mrigankanath73373 жыл бұрын
@@alfcnz like Xq,Xk,Xv can have different shapes right? But after passing throught the linear layer we get q,k,v which is of same shape? Am I right? I am confused now, well lets see your video again
@AnkitSharma-lh6rf Жыл бұрын
Isn't text to image a seq to vector?
@alfcnz Жыл бұрын
If you’re asking questions about the video, plz include the time stamp, or it’s impossible for me to address your point.
@khaledelsayed7623 жыл бұрын
Simply brilliant. However, need to provide concrete example of using the attention mechanism on some simple real example with actual input x and showing how Q, K, V evolve
@alfcnz3 жыл бұрын
That's what the notebook is for, no? Or have I misunderstood your question? 😮😮😮
@khaledelsayed7623 жыл бұрын
@@alfcnz Right but I meant to show the examples within the lecture slides or maybe that would be too lengthy.
@alfcnz3 жыл бұрын
I'll keep in mind when / if I'm going to revisit this lecture. Thanks!
@pesky_mousquito3 жыл бұрын
I think the code and the notebook is missing the "predictor" + "decoder" part
@alfcnz3 жыл бұрын
I'm just using an encoder there. 😅😅😅
@thuantrangnguyenanh38993 жыл бұрын
I could not find the notebook on you repo :(.
@alfcnz3 жыл бұрын
Check last year's repo.
@thuantrangnguyenanh38993 жыл бұрын
Thank you very much.
@alfcnz3 жыл бұрын
@@thuantrangnguyenanh3899 you found it?
@thuantrangnguyenanh38993 жыл бұрын
Yes, it was lecture 15 last year.
@alfcnz3 жыл бұрын
@@thuantrangnguyenanh3899 👌🏻👌🏻👌🏻
@seokhoankevinchoi11823 жыл бұрын
Hello Alf, 0:22:34 Everyone here prob know what you are trying to say, but I think the matrix transpose animation should be fixed.
@alfcnz3 жыл бұрын
Fixed how?
@alfcnz3 жыл бұрын
Ah, you mean flipping over the diagonal?
@seokhoanchoi42183 жыл бұрын
@@alfcnz yes! I think its not a big issue for the most people but just in case....
@alfcnz3 жыл бұрын
I didn't even realise it should flip, haha 😅 In my mind I'm used to think how the columns become the rows. Never actually thought about the mirroring, haha 😅 oops 😅
@my_master553 жыл бұрын
Wow, cool lecture, Alfredo, thank you so much! 🦄 BTW, is this implementation somehow related to the facebook DeiT transformer? Looks like the part of appending multiple attention+cnn layers was also implemented there 🐱🏍
@alfcnz3 жыл бұрын
No no, it's just the basics of self and cross attention.
@capybara_business Жыл бұрын
Compared to other videos I didn't inderstand anything, you notation is weird. Obviously I lack knowledge but I was expecting something clearer, it would be great to add examples along the video. Btw good work, so many hours to build this up.
@alfcnz Жыл бұрын
Which part was not clear? What notation do you find counterintuitive?
@kevon217 Жыл бұрын
calling mom is always a good call
@alfcnz Жыл бұрын
🥰🥰🥰
@НиколайНовичков-е1э3 жыл бұрын
Hello, Alfredo :)
@alfcnz3 жыл бұрын
👋🏻👋🏻👋🏻
@anondoggo2 жыл бұрын
31:48 I want pizza now :(
@anondoggo2 жыл бұрын
57:49 I'm not sure if I understand why we use self-attention first and then cross-attention, where is h_i ^{Enc} coming from? (The conditional information?
@anondoggo2 жыл бұрын
ok so I should have just watched on as this is explained 2 sec later, 01:00:51 :'). So I'm pretty sure h_i is from Enc(x), so are we using h_i as keys and values and h_j as queries, so we know what words in the source to pay attention to during translation? This amounts to providing the first t-1 y observations and getting the last t-1 observations.
@anondoggo2 жыл бұрын
I think this is true. Will double check with the slides.
@anondoggo2 жыл бұрын
Another amazing lecture by Alfredo, the fact that there is no dependency on time is so exciting, I finally understand transformers for the very first time. I am in tears. (At least I think I understand :'))
@alfcnz2 жыл бұрын
I'm glad you have acquired new knowledge 🤓🤓🤓
@soumyajitganguly2593 Жыл бұрын
what I learnt from this video: plural of pizza is not pizzas