Rasa Algorithm Whiteboard - Transformers & Attention 1: Self Attention

  Рет қаралды 103,522

Rasa

Rasa

Күн бұрын

Пікірлер: 141
@deudaux
@deudaux Жыл бұрын
The guy in the video not just understands the concept, but also understands what it is that understanding of others might lack so he can fill in the gaps.
@joliver1981
@joliver1981 4 жыл бұрын
I have watched tons of videos and finally an original video that actually teaches these concepts. There are so many KZbinrs that simply make a video regurgitating something they read somewhere but they don’t really teach anything because they themselves don’t really understand the idea. Bravo. Well done. I actually learned something. Thank you!
@RasaHQ
@RasaHQ 4 жыл бұрын
(Vincent here) I just wanted to mention that I certainly sympathize. It's hard to find proper originality out there when it comes to teaching data science.
@WIFI-nf4tg
@WIFI-nf4tg 3 жыл бұрын
@@RasaHQ Hi Rasa, can you also explain "how" we should express words into numbers for the vector v ? For example, is there a preferred word embedding?
@RasaHQ
@RasaHQ 3 жыл бұрын
@@WIFI-nf4tg Rasa doesn't prefer a word embedding, but a common choice is spaCy. Note that technically, countvectors are also part of the feature space that does into DIET.
@briancase6180
@briancase6180 Жыл бұрын
Yeah, and some of those videos use a script generated by an AI that "read" the relevant sections of a paper. Get used to having to wade through tons of AI-generated content. We need laws that require AI-generated content to be labeled as such. But, it's probably unenforceable. How much original content is enough to avoid the "Ai-generated" label? ☺️
@homeboundrecords6955
@homeboundrecords6955 Жыл бұрын
TOTALLY agree most 'education' vids really add to confusion and just regurgitate jargon over and over
@edouardthomasset6683
@edouardthomasset6683 Жыл бұрын
When the student understands its teacher, it means the teacher understood what he explains. I understood everything contrary to the majority of other youtubers on the same topic. Thanks !
@MrLazini
@MrLazini 10 ай бұрын
I love how you use different colors to represent different dynamics between relationships. Such a simple idea, yet so good at conveying meaning
@guanxi99
@guanxi99 4 жыл бұрын
After dozens of papers and videos studied, that‘s the first one really make me understand the context. Many thanks fornthat!!! It also highlighted for me one fact: Self attention is a smart idea, but the real magic souce is the way word embeddings are created. That will decide on if the contexts created by self attention make sense or do not.
@deoabhijit5935
@deoabhijit5935 3 жыл бұрын
agree
@davidlanday2647
@davidlanday2647 3 жыл бұрын
A good thought! That makes me wonder if there are metrics we can add to a loss function to assess how well a mechanism "attends" to words in a sentence. Like, if we look at the embedding space, we would probably want to see words that are contextually proximal/similar and cluster close together. So I guess, some metric to assess how well an attention mechanism captures all contexts.
@parryhotter18
@parryhotter18 Жыл бұрын
This. If a bit late 😊. Yes the creation of an embedding, i.e. the creation of a vector for each word seems to be the main storage of semantics of each word. This video IS the best i have seen so far in that he always explains firstly WHY and then How the best step works. Great approach!
@stanislawcronberg3271
@stanislawcronberg3271 Жыл бұрын
Only 4 minutes in and I can tell this series will be a banger. Don't stop teaching, this is pure quality content, much appreciated!
@gomogovo4966
@gomogovo4966 Жыл бұрын
I've been looking for a clear explanation for so so so long. First one I've found. I think all the people that made explanatory videos so far, have 0 understanding of the subject. Thank you.
@alirezakashani3092
@alirezakashani3092 2 жыл бұрын
mind blowing how simple self-attention is explained - thank you
@DaveJ6515
@DaveJ6515 Жыл бұрын
Yes sir, this is it. You have nailed it: not only you know the subject; also know he art of creating the condition for everyone else to go into it gradually and logically. Great.
@rommeltito123
@rommeltito123 3 жыл бұрын
I had so many doubts about the actual operation that happens in self attention. This video just cleared it. Excellent delivery in such a short time.
@fallingintofilm
@fallingintofilm 3 жыл бұрын
This was a absolutely eye-opening. Congratulations sir! You win the Internet for a while.
@SiliconValleyRunner
@SiliconValleyRunner 4 жыл бұрын
Best ever explanation of "self-attention". Awesome job.
@foxwalks588
@foxwalks588 3 жыл бұрын
This is the best explanation of attention mechanism so far for a regular person like me! I came here after going through Coursera NLP spec and several papers, but only now I am actually able to see how that works. Seems like embeddings themselves are the secret sauce indeed. Thank you.
@azurewang
@azurewang 4 жыл бұрын
the most intuitive explaination I have ever seem!!! excellent drawing and accent
@mohammedmaamari9210
@mohammedmaamari9210 2 жыл бұрын
The clearest explanation of attention mechanisms I've ever seen. Thank you
@ferneutron
@ferneutron 3 жыл бұрын
Thank you so much for your explanation! When you said: "This is known as SELF ATTENTION". I just thought: BAM! Awesome job Rasa!
@thongnguyen1292
@thongnguyen1292 4 жыл бұрын
I've read dozens of papers and blog posts about this topic, but all they do were mostly walking through the math without showing any intuition. This video is the best I've ever seen, thank you very much!
@binishjoshi1126
@binishjoshi1126 4 жыл бұрын
I've known self attention for some time, this is by far the most intuitive video I've ever see, thank you.
@seanh1591
@seanh1591 2 жыл бұрын
This is the best explanation of Self-Attention mechanism I've encountered after combing through the internet! Thank you!
@timohear
@timohear 3 жыл бұрын
Can't believe I've only stumbled up this now. Fantastic original explanation.
@johnhutton5491
@johnhutton5491 3 жыл бұрын
Since there isn't a like/dislike ratio anymore, for those wondering, this video is great
@avinashpaul1665
@avinashpaul1665 4 жыл бұрын
on of the best example on the web that explains attention mechanism , after reading many blogs i still had my doubts , the way attention is explained between time series and text data is brilliant and helped me understand better.
@mmpcse
@mmpcse 4 жыл бұрын
Have gone through some 10-12 videos on Self Attention. This Attention Series 1,2 &3 are by FAR THE BEST EVER. Many Thanks for these Videos. [came back and updated this comment ;-) ]
@magpieradio
@magpieradio 3 жыл бұрын
This is the best video I have seen so far as to explain things so clearly. Well done.
@brianyoon815
@brianyoon815 2 жыл бұрын
This is incredible after like 10 attention explanations i finally get it here
@vijayabhaskar-j
@vijayabhaskar-j 4 жыл бұрын
This attention series is the best clear and intuitive explanation of self-attention out there! Great work!
@trantandat2699
@trantandat2699 3 жыл бұрын
I have read a lot about this: paper, medium, video, this one make me the best understanding. Very nice!
@galenw6833
@galenw6833 Жыл бұрын
At 11:29, the presenter says "cross product", but I think it's the dot product, so that each of the weights (W_11, etc.) are numbers (otherwise using cross product they would be vectors). Thus we can build a new vector from W_11, W_12, ... Great videos, exactly what I was looking for.
@timholdsworth1
@timholdsworth1 3 жыл бұрын
Why did you use cross product at 11:31? Wouldn't that be making the weights small when the word embedding vectors are similar, which would then mean the related words in the sequence would be unable to influence the current state?
@Erosis
@Erosis 2 жыл бұрын
I think he meant dot product? I don't know.
@briancase6180
@briancase6180 3 жыл бұрын
OMG, this helped me immeasurably. Thanks so much. I just couldn't quite get it from the other explanations I've seen. Now I can go back and probably understand them better. Yay!
@tatiana7581
@tatiana7581 Жыл бұрын
Thank you sooo much for this video! Finally, someone explained what the self-attention is!
@akashmalhotra4787
@akashmalhotra4787 3 жыл бұрын
This is really an amazing explanation! Liked how you build up from time-series and go to text. Keep up the good work :)
@blochspin
@blochspin 2 жыл бұрын
best video hands down on the self attention mechanism. Thank you!!!
@TheGroundskeeper
@TheGroundskeeper Жыл бұрын
Still the best explanation 3 years later
@sowmiya_rocker
@sowmiya_rocker 2 жыл бұрын
Beautiful explanation sir. I'm not sure if i got it all but i could tell you that I've got better idea about self-attention from your video compared to the other ones i watched. Thanks a lot 🙏
@mokhtarawwad6291
@mokhtarawwad6291 Жыл бұрын
Thanks for sharing I have watched based on recommendations from a friend on Facebook I will watch the whole playlist. Thanks for sharing, good bless you 🙏 😊
@dinoscheidt
@dinoscheidt 3 жыл бұрын
Love the style. The more talent takes the time to teach new talent, the better. Very appealing style! Subscribed 🦾
@DrJohnnyStalker
@DrJohnnyStalker 3 жыл бұрын
Best Self Attention Intuition i have ever seen. Andrew Ng Level stuff!
@benjaminticknor2967
@benjaminticknor2967 3 жыл бұрын
Incredible video! Did you mean to say dot product instead of cross product at 11:30?
@ParniaSh
@ParniaSh 3 жыл бұрын
Yes, I think so
@hiteshnagothu887
@hiteshnagothu887 4 жыл бұрын
Never have I ever seen such a great concept explanation. You just made my life easier,@Vincent!!
@dan10400
@dan10400 Жыл бұрын
This is an exceptionally good explanation! Thank you so much. It is easy to see why the thumbs-up count is so high wrt views.
@simranjoharle4220
@simranjoharle4220 Жыл бұрын
This is the best explaination for the topic I have come across! Thanks!
@pi5549
@pi5549 Жыл бұрын
Your whiteboarding is beautiful. How are you doing it? I'd love to be able to present in this manner for my students.
@oritcoh
@oritcoh 4 жыл бұрын
Best Attention explanation, by far.
@Anushkumar-lq6hv
@Anushkumar-lq6hv Жыл бұрын
The best video on self-attention. No debates
@skauddy755
@skauddy755 Жыл бұрын
By far, the most intuitive explanation of self-attention. DISAPPOINTED However, with the number of Likes:(
@sebastianp4023
@sebastianp4023 3 жыл бұрын
please link this video to the tf doc. I tried a whole day to get behind the concept of attention and this explanation is just beautiful!
@luisvasquez5015
@luisvasquez5015 Жыл бұрын
Finally somebody explicitly saying that the distributional hypothesis makes no linguistic sense
@siemdual8026
@siemdual8026 3 жыл бұрын
This video is the KEY for my QUERY! Pun intended. Thank you so much!
@maker72460
@maker72460 2 жыл бұрын
Awesome explanation! It takes great skills to explain such concepts. Looking forward!
@uniqueaakash14
@uniqueaakash14 2 жыл бұрын
best video i have found in self-attention.
@jhumdas4613
@jhumdas4613 2 жыл бұрын
Amazing explanation!! The best I have come across till date. Thank you so much!
@giyutamioka9437
@giyutamioka9437 2 жыл бұрын
Best explaination I have seen so far .... Thanks!
@pranjalchaubey
@pranjalchaubey 4 жыл бұрын
This is one of the best videos on the topic, if not the best!
@adrianramirez9729
@adrianramirez9729 2 жыл бұрын
Amazing explanation ! , it did not find too much sense to the comparison between time series, but the second part was really good :)
@suewhooo7390
@suewhooo7390 3 жыл бұрын
best explanation of attention mechanism out there!! thank you a lot!
@punk3900
@punk3900 7 ай бұрын
The best explanations you can get in the world. Thanks! BTW, were you aware at the time of making these videos that transformers will be so revolutionary?
@Tigriszx
@Tigriszx 2 жыл бұрын
SOTA explanation. that's what i was exactly looking for. [tr] okuyan varsa, bu herifi takibe alın, efsane anlatıyor.
@ArabicCompetitiveProgramming
@ArabicCompetitiveProgramming 4 жыл бұрын
Great series about attention!
@andyandurkar7814
@andyandurkar7814 2 жыл бұрын
A very simple explanation .. the best one!
@shivani404sheth4
@shivani404sheth4 3 жыл бұрын
This was so interesting! Thank you for this amazing video.
@louiseti4883
@louiseti4883 2 жыл бұрын
Great stuff in here. Super clear and efficient for begginers ! Thanks
@timmat
@timmat 4 ай бұрын
Hi. This is a really great visualisation of weightings - thank you! I have a question though: at 11:30 you say you're going to calculate the cross product between the first token's vector and all the other vectors. Should this instead be the dot product, given that you are looking for similarity?
@QuangNguyen-jz5nl
@QuangNguyen-jz5nl 3 жыл бұрын
Thank you for sharing, great tutorial, looking forward to watching more and more great ones.
@hanimahdi7244
@hanimahdi7244 3 жыл бұрын
Thanks a lot! Really amazing , awesome and very clear explanation.
@devanshamin5554
@devanshamin5554 4 жыл бұрын
Very informative and simple explanation of a complicated topic. 👍🏻
@fadop3156
@fadop3156 Жыл бұрын
11:28 is it the cross product and not the dot product of the vector?
@alexanderskusnov5119
@alexanderskusnov5119 Жыл бұрын
To filter (in signals (Low Frequency) and programming (filter predicate vector)) means to hold, not to throw away.
@vikramsandu6054
@vikramsandu6054 2 жыл бұрын
Loved it. Very clear explanation.
@zzzyout
@zzzyout Жыл бұрын
11:25 stumble? Cross product or dot product?
@bootagain
@bootagain 4 жыл бұрын
Thank you for posting this educational and useful video. though I can not undetstand everything yet, I'll keep seeing the rest of series and trying to understand :) I mean it.
@norhanahmed5116
@norhanahmed5116 4 жыл бұрын
thanks alot, that was very simple and useful. hoping all the best for you
@zeroheisenburg3480
@zeroheisenburg3480 3 жыл бұрын
At 11:33, do you mean dot product instead of cross product? If it's cross product, isnt W11*V1 will be 0 since they are perpendicular?
@RasaHQ
@RasaHQ 3 жыл бұрын
(Vincent here) In general w_ij and v_k are non perpendicular. But you are correct that the multiplications here could be more explicitly written as a dot product.
@zeroheisenburg3480
@zeroheisenburg3480 3 жыл бұрын
@@RasaHQ Appreciate replying. I brought it up since the video "verbally" said it was doing cross product. So each w_ij value should be a scalar value in this case? Thanks for the clarification.
@ashh3051
@ashh3051 3 жыл бұрын
You are a great teacher. Thanks for this content.
@sgt391
@sgt391 3 жыл бұрын
Crazy useful video!
@sachinshelar8810
@sachinshelar8810 3 жыл бұрын
amazing stuff . Thanks so much Rasa Team :)
@mohammadelghandour1614
@mohammadelghandour1614 2 жыл бұрын
Thanks for the easy and thorough explanation. I just have one question. How is "Y" now is more representative or useful (more context) than "V"? can you give an example ?
@kevind.shabahang
@kevind.shabahang 4 жыл бұрын
Excellent introduction
@raunakkbanerjee9016
@raunakkbanerjee9016 Жыл бұрын
at 11:28 you say cross product but do you mean dot product?
@williamstorey5024
@williamstorey5024 Жыл бұрын
what is the reweigh method that you used in the beginning? i would like to look that up and get more details on it.
@RaoBlackWellizedArman
@RaoBlackWellizedArman Жыл бұрын
Fantastic explanations‌‌ ^_^ Already subscribed!
@yacinerouizi844
@yacinerouizi844 4 жыл бұрын
Thank you for sharing, great tutorial!
@gainai_r27
@gainai_r27 3 жыл бұрын
this is awesome. what tool do you use to create this whiteboard?
@mohajeramir
@mohajeramir 3 жыл бұрын
this was an excellent explanation. Thank you
@vulinhle8343
@vulinhle8343 4 жыл бұрын
amazing video, thank you very much
@ashokkumarj594
@ashokkumarj594 4 жыл бұрын
I love your tutorial 😙😙 Best explanation
@clivefernandes5435
@clivefernandes5435 4 жыл бұрын
So this is different from the one we use a feedforward network rite? Used by Bahdanau
@subhamkundu5043
@subhamkundu5043 Жыл бұрын
I have a query in the video there is one sentence called "Bank of the river", now suppose there is another sentence called " I love youtube videos a lot". Here the no of words are more so does the number of words matter ?
@jmarcio51
@jmarcio51 3 жыл бұрын
I got the idea, thanks for the explaination.
@kumardeepankar
@kumardeepankar 2 жыл бұрын
@Rasa Didn't the curve will go up and down instead of increasing continuous.
@Deddiward
@Deddiward 2 жыл бұрын
Wow this video is so well done
@SubhamKumar-eg1pw
@SubhamKumar-eg1pw 4 жыл бұрын
But in general the weights are trained right?
@krishnachauhan2850
@krishnachauhan2850 4 жыл бұрын
First time I m getting intuition if attention seriously...but sir I am confused are ppl using this only in speech analysis as it's time series data...like you introduced
@roncahlon
@roncahlon Жыл бұрын
Why do you ned to multiply by the word vectors for a second time? I.E why couldn't you just say Y1 = w11 + w12 + w13 + w14. What is the value of having it normalized?
@TimKaseyMythHealer
@TimKaseyMythHealer Жыл бұрын
Trying to wrap my brain around LLM processing. It would be great if someone were to create a 3D flow chart of all layers, all attention heads. Zooming into each section as a single word and/or sentence is being processed.
@ishishir
@ishishir 3 жыл бұрын
Brilliant explanation
@23232323rdurian
@23232323rdurian Жыл бұрын
the content for the word vectors are all the OTHER words seen to statistically co-occur in corpora, weighted by their frequencies....so stopwords [the, a, is] cuz they are so frequent, hence dont contribute much topic/semantics, while contentwords are less frequent so contribute more. the word vector ('meaning') for is just all the N words most frequently observed nearby CAT in corpora, discounting for frequency.... works great for cases like [king, queen] cuz they occur in similar contexts in copora... but not for [Noah, cat] cuz that's peculiar/local to this instance..... and also not for co-references [cat, she] which are harder to resolve....you gotta keep a STORY context....where presumably you mighta already seen some reference to ...... and for the co-reference.....well, they're just harder to resolve, tho in this example *HAS* to resolve to either Noa or cat, cuz those are the ONLY choices, and by chance (we assume) all three co-refer..... ==> after all, there's a legit chance that isnt the cat in the example, but the cat's MOM, who can be an ANNOYING MOM, yet nevertheless Noa is still a great cat.....
@saianishmalla2646
@saianishmalla2646 2 жыл бұрын
This was extremely helpful !!
@arvindu9344
@arvindu9344 7 ай бұрын
Best explanation, that you so much.
@distrologic2925
@distrologic2925 Жыл бұрын
love the format
小丑女COCO的审判。#天使 #小丑 #超人不会飞
00:53
超人不会飞
Рет қаралды 16 МЛН
Cat mode and a glass of water #family #humor #fun
00:22
Kotiki_Z
Рет қаралды 42 МЛН
Мясо вегана? 🧐 @Whatthefshow
01:01
История одного вокалиста
Рет қаралды 7 МЛН
Lecture 12.1 Self-attention
22:30
DLVU
Рет қаралды 72 М.
Attention in transformers, visually explained | DL6
26:10
3Blue1Brown
Рет қаралды 1,9 МЛН
Rasa Algorithm Whiteboard - StarSpace
11:47
Rasa
Рет қаралды 8 М.
Query, Key and Value Matrix for Attention Mechanisms in Large Language Models
18:21
Machine Learning Courses
Рет қаралды 10 М.
The math behind Attention: Keys, Queries, and Values matrices
36:16
Serrano.Academy
Рет қаралды 268 М.
CS480/680 Lecture 19: Attention and Transformer Networks
1:22:38
Pascal Poupart
Рет қаралды 351 М.
小丑女COCO的审判。#天使 #小丑 #超人不会飞
00:53
超人不会飞
Рет қаралды 16 МЛН