Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 14 - Transformers and Self-Attention

  Рет қаралды 150,201

Stanford Online

Stanford Online

Күн бұрын

Пікірлер: 38
@irenejenna5725
@irenejenna5725 Жыл бұрын
This is what makes Stanford great. The guy giving the lecture is the guy who actually invented the technique.
@BiA-hg7qr
@BiA-hg7qr 3 күн бұрын
He and 7 other people, okay? Don't forget them, okay?
@ersandy4u
@ersandy4u Жыл бұрын
Oh my God! He is the father of modern AI and Machine Learning. Changed the world forever with that model used for chatGPT.
@ARATHI2000
@ARATHI2000 4 ай бұрын
Amazing...Vaswani delivered this lecture in 2019. Became a legend in 2022 in tech community. Given his influence, he should get much more credit. If there is any Nobel prize in this field, he deserves it given his universal impact. 💪🏅
@oj_simpson
@oj_simpson Жыл бұрын
This person changed the world. He is the one behind the AI revolution. ❤
@labsanta
@labsanta Жыл бұрын
key points: - **[**00:04**]** Introduction of two invited speakers, Ashish Vaswani and Anna Huang, who will discuss self-attention in generative models and its applications, especially in music. - **[**01:00**]** Ashish Vaswani discusses self-attention, focusing on its application beyond specific models, and its role in understanding the structure and symmetries in datasets. - **[**01:58**]** The talk shifts to learning representations of variable length data, underlining the importance of representation learning in deep learning. - **[**02:26**]** Discussion on recurrent neural networks (RNNs) as the traditional models for sequence data, and their limitations in terms of parallelization and capturing long-term dependencies. - **[**04:17**]** Examination of the advantages of self-attention over RNNs, particularly in handling large datasets and efficiently summarizing information. - **[**05:14**]** Comparison between self-attention and convolutional sequence models, highlighting the parallelization benefits and efficient local dependencies handling in the latter. - **[**06:36**]** Introduction of the idea to use attention for representation learning, leading to the development of the transformer model. - **[**07:34**]** Explanation of how self-attention properties aid text generation, particularly in machine translation. - **[**08:58**]** Background on previous works related to self-attention and its evolution leading to the transformer model. - **[**10:23**]** Description of the transformer model architecture, emphasizing its components like encoder, decoder, and positional representations. - **[**12:17**]** Technical breakdown of the attention mechanism and its computational advantages, including speed and simplicity. - **[**13:43**]** Discussion on the efficiency of attention mechanism and its comparative performance against RNNs and convolutions. - **[**15:02**]** Analysis of how attention mechanisms can simulate convolutions and their application in language processing, particularly in understanding hierarchical structures. - **[**17:24**]** Results of applying the transformer model to machine translation, demonstrating significant improvements over previous models. - **[**19:18**]** Introduction of the concept of residual connections in transformers and their role in maintaining positional information. - **[**21:12**]** Exploration of self-attention in modeling repeating structures in images and music, showcasing its versatility beyond text. - **[**25:27**]** Discussion on adapting self-attention for image modeling, addressing the challenges and solutions for handling large image datasets. - **[**30:43**]** Transition to Anna Huang's segment on applying self-attention in music generation, explaining the methodology and underlying principles. - **[**34:53**]** Demonstration of the music transformer's capabilities, highlighting its effectiveness in maintaining coherence over longer sequences. - **[**38:19**]** Discussion on the limitations of traditional attention models and the introduction of positional sinusoids for maintaining sequence structure. - **[**40:16**]** In-depth explanation of relative attention and its benefits in handling long sequences, particularly in translation and music. - **[**44:35**]** Insights into the applications of relative attention in images, focusing on its ability to achieve translational equivariance, a key property in image processing. - **[**46:30**]** Exploration of relative attention in graph-based problems and its connection to message passing neural networks. - **[**48:52**]** Summary of the benefits of self-attention, including modeling self-similarity, translation equivariance, and applications in graphs. - **[**51:42**]** Insights into the application of self-attention in transfer learning, scaling up models, and their utility in self-supervised learning and multitasking. This summary encapsulates the key topics and insights from the lecture.
@ikichiziki87
@ikichiziki87 6 ай бұрын
That's the legend standing right there, the guy who invented it !
@conscience580
@conscience580 7 ай бұрын
"inductive biases you actually care about" - what an understatement!
@go4chaitu
@go4chaitu Жыл бұрын
GPT's way of explaining Self-Attention - "Let's imagine you are working on a story, and you have a bunch of sentences or ideas written down on separate cards. Each card represents a word or a piece of information. Now, to make your story more interesting, you want to see how each idea or word is related to the others. That's where self-attention comes in! Self-attention is like having a group of friends who help you decide which parts of your story are most important and how they connect to each other. Here's how it works: Each card has three special markers: a question marker, a key marker, and a value marker. These markers help your friends understand what each card is about. First, your friends take one card and look at its question marker. It's like the friend is asking, "What's important about this word or idea?" Then, they compare the question marker to the key markers on all the other cards. It's as if they are looking for similarities or connections between the different parts of the story. When they find a match, they give that card a score based on how related it is to the question. The higher the score, the more important it is! After scoring all the cards, your friends use the scores to decide how much attention each card should get. It's like giving each card a special weight or importance based on the connections it has with the others. Finally, they take the value marker on each card and add up all the values, giving more weight to the cards with higher scores. This creates a new version of your story where each word or idea has been improved based on the attention it received. This process is repeated several times, with your friends going through the cards and adjusting the attention each time. This helps your friends understand more about how the different parts of your story fit together. So, self-attention is like having friends who pay attention to each part of your story, figure out how they are connected, and make sure everything flows smoothly. It helps make your story more interesting and helps the model understand how words and ideas relate to each other in a text."
@acasualviewer5861
@acasualviewer5861 Жыл бұрын
What it is, is relatively simple.. but why it uses the query, key and value quantities is a bit weird to me. I mean, I'm glad it works, but what was the reasoning behind doing it that way.
@grownupgaming
@grownupgaming 2 жыл бұрын
I would be scared for my life if I was Ashish. Someone from the future might travel back in time and kill me.
@Ashish8363
@Ashish8363 Жыл бұрын
using midi notes as input to the same kind of model generating actual notes pleasing to humans is an insane level capability. This is so next level
@RARa12812
@RARa12812 Жыл бұрын
Here is the summary 1.39 Looking for structure in data set 2.10 Variable length data 2.43 Primary workhorse upto date in RNN (How many people knows RNN, laughter…If you don’t know what RNN is how can you follow this lecture) 3.49 Recommends a book by Oliver Sefridge, Pandominiums 3.60 RNNs limit parallelization 5.00 Precursor to self attention - Convolutional sequential models 7.10 Compare words and make comparisons (explaining self attention) 8.10 Multiplicative models are needed 9.19 Attention uses within the confines of RNNs 9.53 Transformer model explanation 10.43 Encoder 11.59 Encodr/decoder mechanism 12.29 Encoder self attention explanation, self attention 14.24. Attention mechanism is quadratic. It involves two matrix multiplication 14.50. Attention is attractive when dimension is larger than length 15.09 attention is faster 15.22 convolutions 16.00 why one attention layer is not enough. (multi head attention) 16.50. Attention layeras a feature detector 21.50. Self attention for images 24.28. Image transformer Replacing words with pixels 27.00 rasterization 30.00. next lecture. Music
@VioletPrism
@VioletPrism Жыл бұрын
Changed the world forever amazing.
@NishantSharmachannel
@NishantSharmachannel 5 ай бұрын
He is from BIT MESRA(Birla Institute of Technology Mesra),Ranchi !!! Proud Alumnni of our college.
@juanandrade2998
@juanandrade2998 Жыл бұрын
Interesting how the high note of this presentation was music, little did they knew that all they need was just to scale the LM more... a lot more. But in all honesty, I don't think any of the authors of the paper believe that the "self-attention" mechanism is the only missing piece of the puzzle (AGI or whatever you call it), and no amount of data fed to the model will supplant that.
@laodrofotic7713
@laodrofotic7713 Жыл бұрын
He is the true creator of chatGPT. leave it to the americans to rebrand something and put a dollar sign on it.. smh
@capravasranjan2121
@capravasranjan2121 8 ай бұрын
Thanks for this wonderful lecture..
@tarlanahad
@tarlanahad 2 жыл бұрын
Amazing Lecture. Thanks!
@danilo_88
@danilo_88 Жыл бұрын
This is cool. The inventor of transformers
@handsomemehdi3445
@handsomemehdi3445 Жыл бұрын
is it Turkish song at 14:07?
@krishnakantsharma6161
@krishnakantsharma6161 3 ай бұрын
He is from india afterall ♥️
@abrorabyyu6221
@abrorabyyu6221 Жыл бұрын
this person is badass
@munagalavenkatesh7166
@munagalavenkatesh7166 2 жыл бұрын
Great lecture
@GenAIWithNandakishor
@GenAIWithNandakishor Жыл бұрын
He is an Indian by origin. Very sad that the main author of attention is all you need was never known!!.
@aimatters5600
@aimatters5600 Жыл бұрын
damn the guy who created modern AI damn.
@oraz.
@oraz. Жыл бұрын
There's the dude right there.
@PriyanshuAman-dn5jx
@PriyanshuAman-dn5jx 9 ай бұрын
The goat 🐐
@XShollaj
@XShollaj Жыл бұрын
What a legend
@omarrandoms4157
@omarrandoms4157 8 ай бұрын
his explanation isn't clear, his mind is.
@Will-vq5oc
@Will-vq5oc 9 ай бұрын
I watch!
@jamesthesnake12
@jamesthesnake12 2 жыл бұрын
great
@BirdLindsay-s2w
@BirdLindsay-s2w 2 ай бұрын
Jaleel Motorway
@JiacongMi
@JiacongMi 2 жыл бұрын
Monash FIT5217 到此一游
@Teng_XD
@Teng_XD Жыл бұрын
WCU?
@flyhighflyfast
@flyhighflyfast 11 ай бұрын
OG
@stevehaas9515
@stevehaas9515 Жыл бұрын
🦄
@TrollMeister_
@TrollMeister_ 8 ай бұрын
I am listening to Ashish Vaswani’s fake / imitation accent. It’s amusing in a way. There are still huge gaps in how he pronounces words that belie his true (Indian) accent. Pay attention to how words are spoken here Ashish. You can’t learn without paying attention!
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
Tuna 🍣 ​⁠@patrickzeinali ​⁠@ChefRush
00:48
albert_cancook
Рет қаралды 148 МЛН
Quilt Challenge, No Skills, Just Luck#Funnyfamily #Partygames #Funny
00:32
Family Games Media
Рет қаралды 55 МЛН
CS480/680 Lecture 19: Attention and Transformer Networks
1:22:38
Pascal Poupart
Рет қаралды 351 М.
RAAIS 2019 - Ashish Vaswani, Senior Research Scientist at Google AI
23:40
The Research and Applied AI Summit - RAAIS
Рет қаралды 10 М.
The math behind Attention: Keys, Queries, and Values matrices
36:16
Serrano.Academy
Рет қаралды 268 М.
Attention in transformers, visually explained | DL6
26:10
3Blue1Brown
Рет қаралды 1,9 МЛН
Attention Is All You Need
27:07
Yannic Kilcher
Рет қаралды 658 М.