Transformer Neural Networks, ChatGPT's foundation, Clearly Explained!!!

  Рет қаралды 748,610

StatQuest with Josh Starmer

StatQuest with Josh Starmer

Күн бұрын

Пікірлер: 1 300
@statquest
@statquest Жыл бұрын
To learn more about Lightning: lightning.ai/ Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@NeoShameMan
@NeoShameMan Жыл бұрын
Personally I find it more clear to link embedding to hidden class of words, i use character sheets as a netaphor, because what attention does is not looking at word but the description in its sheet, with each attention head focusing on different part of the description, which mean a word representation have multiple attention on different hidden class. Then at the end we look at the sheets transformed at each layer to find the next word. That also allows to explain multimodality, ie make sure image input and text input share the same description sheet.
@statquest
@statquest Жыл бұрын
@@NeoShameMan Interesting.
@Tothefutureand
@Tothefutureand Жыл бұрын
Transformers needs more than one video, each part(multi H attention, word embeding(sin&cosine similarity),training &…) I was waiting for long to reach stat of art.
@statquest
@statquest Жыл бұрын
@@Tothefutureand I thought about doing it that way - and that was the original plan. But my video on Attention convinced me that most people would rather have a single video that has everything in it all at once. However, I've provided links in this video's description to full length videos on each topic you are interested in.
@NeoShameMan
@NeoShameMan Жыл бұрын
@@statquest oh you mentioned that you don't know why this number of head, that's hardware optimization, ie they can be split into gpu or memory pool or reduce bandwidth, such that they can parralelized or compute sequentially on resource starved machine.
@jediknight120
@jediknight120 Жыл бұрын
As a Computer Science professor who teaches Machine Learning, this is probably my most anticipated video ever. I regularly use your videos to brush up on/review ML concepts myself and recommend them to my students as study aids. You explain these concepts in the clear, straightforward way that I aspire to. Thank you!
@statquest
@statquest Жыл бұрын
Thank you! BAM! :)
@yizhou6877
@yizhou6877 Жыл бұрын
Me too!
@Daigandar
@Daigandar Жыл бұрын
@@statquest our data analysis professor also uses your videos as references and recommends you almost every session haha. i learned about this amazing channel from him.
@statquest
@statquest Жыл бұрын
@@Daigandar That's awesome! BAM! :)
@cienciadedados
@cienciadedados Жыл бұрын
Well said. I do the same!
@alefalfa
@alefalfa Жыл бұрын
Its kinda hilarious that StatQuest videos give the impression they were menat for 5 year olds, yet are exploring legitimately complex topics. No jargon, no overcomplicated diagrams. Josh really tries to explain things and not show off his supirior understanding of neural networks. Thanks Josh!
@statquest
@statquest Жыл бұрын
Thank you! :)
@ran_domness
@ran_domness Жыл бұрын
Much like Richard Feynman.
@williamarias815
@williamarias815 9 ай бұрын
BAM!
@MinChitXD
@MinChitXD 11 ай бұрын
I've just learned machine learning for a month, my major is a pure business student. I've been working as a Data Analyst for 2 months as the internship and I believe machine learning will be essential if I want to go further in this industry. Out of all tutorials videos I've watched, your videos brought up the clearest and most concise concepts for me to understand. All the videos walked me through from the series of neural network, back propagation, cross entropy with backward propagation, recurrent, LSTM and convolutional neural network, lastly, this video. Really appreciate for your understandings and amazing storytelling through your videos, your contents always make me eager to keep learning machine learning myself. Thanks a lot
@statquest
@statquest 11 ай бұрын
Thank you very much! I'm glad my videos are helpful.
@aayushsmarten
@aayushsmarten Жыл бұрын
This is the complet-est, precious-est, pur-est, brilliant-est video ever. Can't imagine how much work you've put into creating these illustrations. It's just brilliant. Hats off.
@statquest
@statquest Жыл бұрын
Wow, thank you!
@lumiey
@lumiey Жыл бұрын
Did you just tokenize your comment?
@statquest
@statquest Жыл бұрын
@@lumiey I'm not sure I understand.
@lumiey
@lumiey Жыл бұрын
@@statquest He just separated words like complet, est, precious, est, pur, est... like tokenizer does (e.g. following -> follow, ing)
@aayushsmarten
@aayushsmarten Жыл бұрын
@@lumieyHaha
@bobbymath2813
@bobbymath2813 10 ай бұрын
How a model like this was created is just beyond me. There’s so many different moving parts. You could write a whole book on the fully-connected network alone. Add in all the other stuff? Wow. Thank you, Josh, for explaining this so well!
@statquest
@statquest 10 ай бұрын
Thanks! It's a little easier to understand how this model was created in the first place if you follow the whole Neural Networks playlist. You'll see how things changed, one step at a time, to eventually end up with a transformer: kzbin.info/www/bejne/eaKyl5xqZrGZetk
@bobbymath2813
@bobbymath2813 10 ай бұрын
@@statquest Thanks Josh! I’ll check out that playlist. What you’re doing is so special to the world, and humanity is so indebted.
@AmitBhor
@AmitBhor Жыл бұрын
22:12 8 heads because 8 gpu clusters are common and hence can compute in parallel . The embedding dimension are 512 and that leaves each head has 64 query size. Great video 👍
@statquest
@statquest Жыл бұрын
Awesome!
@TheTimtimtimtam
@TheTimtimtimtam Жыл бұрын
Thank you
@jakob2946
@jakob2946 9 ай бұрын
Does the second part mean that each head only gets a portion of the embeddings?
@oliviervangoethem9365
@oliviervangoethem9365 8 ай бұрын
@@jakob2946 curious aswell, I looked it up and it seems that its not true, every head is applied to all dimensions of the embedding. This also makes more sense to me since the word embeddings should be looked at as a whole. please correct me if I'm wrong
@tekrunner987
@tekrunner987 8 ай бұрын
@@oliviervangoethem9365 I don't know about more recent transformers, but in the initial architecture each attention head is applied to a projection of input embeddings, with reduced dimensionality (in the original "Attention is all you need" paper: embeddings have a dimension of 512, and each of the 8 attention heads has a dimension of 64). The reason for this is spelled out in the original paper: "Multi-head attention allows the model to jointly attend to information from different representation subspaces at different positions. With a single attention head, averaging inhibits this."
@fgfgdfgfgf
@fgfgdfgfgf Жыл бұрын
I've been looking for tutorial about transformers for a long time. This is the smoothest tutorial. It does not hide any complexities(making me confident that I actually understand the concept instead of its dumbed down version for mortals that won't end up ever using the knowledge), but also does not get lost while explaining those complexities and clearly calls out what else I can learn about to understand the side concepts better. Super !!!
@statquest
@statquest Жыл бұрын
Thank you very much! :)
@limitlesslife7536
@limitlesslife7536 7 ай бұрын
you are a blessing for anyone who is a visual learner. You have the gift to be able to explain complex topics in easy way.
@statquest
@statquest 7 ай бұрын
Thank you!
@midolion8510
@midolion8510 Жыл бұрын
I can't imagine how much effort it took for ai scientists to make this model. I really admire your illustration 😀
@statquest
@statquest Жыл бұрын
Thank you so much 😀!
@urazc5917
@urazc5917 Жыл бұрын
This video is a treasure in a world where is explained in 2 minutes. Thank you Josh!
@statquest
@statquest Жыл бұрын
Thank you very much!
@kosukenishio9670
@kosukenishio9670 Жыл бұрын
For slowpokes like me: The example assumes total vacabulary size of 4 for each language. Thanks Josh for providing some of the best content on the subject! Finally the K, Q, V made clear sense
@statquest
@statquest Жыл бұрын
BAM! :)
@TheTimtimtimtam
@TheTimtimtimtam Жыл бұрын
Thank you from a fellow slowpoke
@REV_Pika
@REV_Pika 7 ай бұрын
its amazing how you make a 2 hours lecture in just 30 mins and explain it way better, after finishing this video and realizing what I just grasped, its mind blowing how you can make such complicated subject easy to understand. thank you very much!
@statquest
@statquest 7 ай бұрын
Glad it helped!
@CharlesPayne
@CharlesPayne 7 ай бұрын
Not to be a buzz kill, but I suffered a bad Traumatic brain injury in my late 40's after being hit by an SUV while stopped on a motorcycle. I'm blessed i survived . At the time my job dealt with engineering and architecting IT solutions and I was looking forward to advancing my career into AI and Machine Learning. I was in a coma for a while and I lost lots of what i used to know. I know have Learning disabilities and memory issues. I have improved some over the last years, but If I'm being honest with myself, I wouldn't want me as an engineer, so I'm trying to move into management. I'm glad I ran across these videos . I purchased the .pdf books and notebooks today and I can honestly say they are well worth it. Josh, I'm so glad You created this material. Your books and notebooks etc.. are helping me slowly understand complex topics in hopes that I can stay relevant and continue to advance my career. Thanks again!
@statquest
@statquest 7 ай бұрын
TRIPLE BAM!!! Thank you so much for supporting StatQuest and I wish you the best as you continue to learn about ML and Data Science! :)
@maximeentsi2205
@maximeentsi2205 Жыл бұрын
I try-harded deeply to understand transformers in few mouths ago, I can say that this video is a must have. Thank you Josh
@statquest
@statquest Жыл бұрын
Glad it was helpful!
@Cld136
@Cld136 Жыл бұрын
Thanks, Josh, for keeping your promise to make a video about Transformers. I learned a lot and truly appreciate your effort in explaining this concept. I just placed an order to buy your book and made a donation to support the channel. I'm looking forward to more content on Machine Learning and hope to see videos about GPT and BERT models. ♥
@statquest
@statquest Жыл бұрын
Thank you so much!!! I really appreciate your support (TRIPLE BAM!!!). I hope to do the GPT video soon, but we'll see - the timeline is a little out of my control right now.
@roshanbajaj3370
@roshanbajaj3370 Жыл бұрын
I try very much to understand transformer but I got 100% satisfaction from your video , your explanation is like unbelivable . I enjoy study very much when you taught any topic
@statquest
@statquest Жыл бұрын
Thanks and welcome!
@VeloFX
@VeloFX Жыл бұрын
The explanations in your videos are incredibly precise and efficient at the same time. There is nothing better to watch when learning any ML topic! 👍
@statquest
@statquest Жыл бұрын
Thank you very much! :)
@nilson_001
@nilson_001 Жыл бұрын
Thanks to your engaging visualization and clear explanation, I've grasped the Stanford CS224n course! Your content is neatly condensed but doesn't miss a thing. It's like you've taken all the complex concepts and served them up on a platter. Triple Bam!
@statquest
@statquest Жыл бұрын
Congratulations! TRIPLE BAM! :)
@harryspeaks
@harryspeaks Жыл бұрын
Definitely the clearest walkthru of Transformer. It's very good that you put heavy emphasis on the parallelizability of Transformer since IMO it is the most important feature that made Transformer so useful
@statquest
@statquest Жыл бұрын
agreed!
@20thwin
@20thwin 3 ай бұрын
This is just amazing ! I have no words, I came here to understand how transformers works and I am completely blown away
@statquest
@statquest 3 ай бұрын
Thanks!
@kurtosismusic
@kurtosismusic Жыл бұрын
I just finished watching almost all the videos on this channel and i have to say that this is probably the best place to learn stats and machine learning. I also bought the ML book and it captures the essence of the style of teaching on this channel really well and is very handy to go back and quickly look up some details. You are doing great work!
@statquest
@statquest Жыл бұрын
Wow, thanks!
@meirgoldenberg5638
@meirgoldenberg5638 Жыл бұрын
Which book?
@statquest
@statquest Жыл бұрын
@@meirgoldenberg5638 I think he is referring to my book, The StatQuest Illustrated Guide to Machine Learning at statquest.org/statquest-store/
@Joy-dn8yz
@Joy-dn8yz Жыл бұрын
words cannot describe how happy I am to be able to watch this video. You really helped me with my studies. It is you who made me so interested in AI and think that I am actuaaly able to understand what is going on. Thank you for your simplified models. They really help when larning more complex stuff on this or that theme. But everytime there's a theme I do not know, the first thing I do is go to statquest. Thank you, Josh!
@statquest
@statquest Жыл бұрын
Hooray!!! Thank you very much!
@isseym8592
@isseym8592 Жыл бұрын
As a computer science student getting into the field of NLP, I really can't thank you enough for making a video that breaks down Transformer like this. Our uni doesn't go in depth about NLP related topics and with a very brief explanation they do, the uni expects us to have a full understanding about NLP. I can't thank you enough!
@statquest
@statquest Жыл бұрын
Thanks!
@user-et8es9vg5z
@user-et8es9vg5z 7 ай бұрын
I finally decided to buy your book thinking there'd be transformers in the "Neural Network" section. But even if they're not, I'm glad it supports you. Your content is the best in popularisation that I've seen. It mainly helps me a lot to refresh and understand better than before to start my internship in AI after 1 year of gap year.
@statquest
@statquest 7 ай бұрын
I'm starting a book on neural networks every soon.
@georgl914
@georgl914 5 ай бұрын
Thank you so much for this video. More than 35:years in IT, worked with punching cards, I started to believe I am too old for this new thing. It was not my age, but the way how the knowledge is offered which does not fit my learning style. Finding your video, listening and immediate insights, hearing the coins dropping, were only separated by minutes. Ordering your book is the least minimum I could do to say thank you.
@statquest
@statquest 5 ай бұрын
TRIPLE BAM!!! Thank you very much for your support! :)
@ahmarhussain8720
@ahmarhussain8720 4 ай бұрын
this is without a doubt the best explanation of transformers for a beginner out there. Thank you kind sir
@statquest
@statquest 4 ай бұрын
Glad you liked it!
@ЕгорБакланов-о8п
@ЕгорБакланов-о8п Жыл бұрын
You're the only person on social media that can explain such complicated topics in an easy to understand manner. Keep up!
@statquest
@statquest Жыл бұрын
Thanks, will do!
@heike_p
@heike_p 10 ай бұрын
I'm following an advanced master of Artificial Intelligence. This whole NN playlist has saved me while studying for my exams! Thanks a bunch!
@statquest
@statquest 10 ай бұрын
good luck!
@daringcalf
@daringcalf Жыл бұрын
Only videos like this can have "clearly explained" in the title.
@statquest
@statquest Жыл бұрын
bam!
@gyuio100
@gyuio100 Жыл бұрын
Very clear and builds up the concepts in a step by step manner, rather than starting with the overall architrcture.
@statquest
@statquest Жыл бұрын
Thanks!
@berkk1993
@berkk1993 Жыл бұрын
I've spent a good deal of time studying attention, the critical concept behind transformers. Don't anticipate a natural understanding of the Q, K, and V parameters. We aren't entirely certain about their function; we can only hypothesize. They could still function effectively even if we used four parameters instead of three. One crucial point to remember is that our intuitive understanding of neural networks (NNs) is far from complete. The matrices for Q, K, and V aren't static; they're learned via backpropagation over lengthy training periods, thus changing over time. As a result, it's not as certain as mathematical operations like 1+1=2. The same applies to the head count in transformers; we can't definitively state whether eight is a good number or not. We don't fully grasp what each head is precisely doing; we can only speculate.
@GreenCowsGames
@GreenCowsGames Жыл бұрын
In visual transformers, we do understand what each head does. I guess heads trained on language are more difficult to interpret for us.
@nich.1918
@nich.1918 Жыл бұрын
@@GreenCowsGames no, we don’t know that they do.
@pratyushrao7979
@pratyushrao7979 10 ай бұрын
I had never struggled so much with understanding a concept before. But you cleared all the doubts. Thank you!
@statquest
@statquest 10 ай бұрын
Glad it helped!
@pratyushrao7979
@pratyushrao7979 10 ай бұрын
@@statquest I actually had a doubt as I was going through, about the decoder part. In the masked multi head attention part of the typical transformer, what inputs do we provide? And is this part only used during training?
@statquest
@statquest 10 ай бұрын
@@pratyushrao7979 I actually talk about masking in my video on decoder-only transformers here: kzbin.info/www/bejne/mIKYc6Klob1sd8k
@coolsai
@coolsai Жыл бұрын
BEST EVER VIDEO ABOUT CHAT GPT! I watched many videos but this video is just BAM!
@statquest
@statquest Жыл бұрын
Thank you!
@emanelsheikh6344
@emanelsheikh6344 Жыл бұрын
I've searched a lot about the transformers but seriously this is the best explanation I've ever got. Amazing!❤
@statquest
@statquest Жыл бұрын
Wow, thank you!
@TudorTatar-ny8zw
@TudorTatar-ny8zw Жыл бұрын
The positional encoding explanation truly was a BAM!
@statquest
@statquest Жыл бұрын
Hooray! :)
@abdullahbaig7517
@abdullahbaig7517 6 ай бұрын
It's interesting how many different topics one has to contextualize to understand something like transformers! It's great to see all the math happening in detail and stop and ponder time-to-time while learning something as complex as transformers for the first time. It really helped build my intuition of a lot of building blocks for a transformer-based neural network. Thank you!
@statquest
@statquest 6 ай бұрын
Glad you enjoyed it!
@rikki146
@rikki146 Жыл бұрын
That is a lot of stuff in a single video!! For those who are wondering, ChatGPT is a decoder only neural network, and the main diff between an encoder and a decoder is that a decoder uses masked attention - thus ChatGPT is essentially an autoregressive model. Notice how ChatGPT generates a response in sequential order, from left to right. Anyway, good stuff!
@statquest
@statquest Жыл бұрын
Yep - I'd like to make a GPT video just to highlight the explicit use of masking (the self attention in the decoder in this video used masking implicitly).
@technicalbranch99
@technicalbranch99 Жыл бұрын
@@statquest Please do that video soon :) BAM
@knanzeynalov7133
@knanzeynalov7133 3 ай бұрын
I'm just starting learning about Machine Learning, and this video has been very clear as an introduction to learn concepts and move on. Thanks for great content, contiuining with side quests right now!
@statquest
@statquest 3 ай бұрын
Thank you very much!
@tupaiadhikari
@tupaiadhikari Жыл бұрын
Prof. Starmer, Thank You very much. You are an inspiration to all the aspiring Machine Learning Enthusiasts. Respect and Gratitude from India. #RESPECT
@statquest
@statquest Жыл бұрын
Thank you very much!
@iankitveer
@iankitveer 4 ай бұрын
I am coming to Josh after about 3 years, I love Josh Starmer style of teaching and it has not changed a bit. The good news is Josh is now teaching advanced Deep learnng concepts.
@statquest
@statquest 4 ай бұрын
bam! :)
@gvascons
@gvascons Жыл бұрын
And so we reach the state-of-art!! Congrats Josh :D
@statquest
@statquest Жыл бұрын
Hooray! :)
@eating_a_cookie
@eating_a_cookie Жыл бұрын
Triple bam.
@pw7225
@pw7225 Жыл бұрын
2017...
@sdsa007
@sdsa007 Жыл бұрын
Transformers! More than meets the eye!? I think there is a lot of value in knowing this technology well! Thank you for your humor and learning support, I can't wait to return the favor!
@statquest
@statquest Жыл бұрын
Thanks!
@MattBurkholder-l3u
@MattBurkholder-l3u Жыл бұрын
Your neural networks playlist including this video gave me an intuitive understanding of transformers in less than a week which is something that would have taken an entire semester otherwise. I stumbled onto them while searching for a better understanding of Q,K,V, which everyone seems to say is as simple as querying a database…but what does that even mean?? Your explanations are brilliant, and I will be sharing with everyone I know who wants to learn more about this topic. I look forward to future videos. Thank you!
@statquest
@statquest Жыл бұрын
Thank you very much!!! I really appreciate it.
@iwokeupdead1093
@iwokeupdead1093 Жыл бұрын
I'm currently studying for job interviews and I don't know what I would do without you, thank you! When I get paid from my first job I will donate to you :)
@statquest
@statquest Жыл бұрын
Wow! Thank you!
@wd8222
@wd8222 Жыл бұрын
Best explanation I found in the whole Internet ! although I admit I needed 2 full turns. well done Josh !
@statquest
@statquest Жыл бұрын
Thanks! - Yes, this video packs in a ton of information, but I couldn't figure out any other way to make it work.
@andrewdouglas9559
@andrewdouglas9559 Жыл бұрын
I don't know how I'd learn DataScience/ML without this channel. Thanks so much for doing what you do!
@statquest
@statquest Жыл бұрын
Happy to help!
@matthewhaythornthwaite9910
@matthewhaythornthwaite9910 Жыл бұрын
Thanks Josh, another great video, I’ve been following your channel for years now and your videos have massively helped me to change career so huge thanks. On to the transformer network, there’s something about the positional encoding that makes me feel a little uneasy. It feels we’ve gone through great effort to train a word embedding model that can cluster similar words together in n-dimensional word embedding space (where n can be very large, often 1,000). By then applying positional encoding before our self-attention, whilst you very clearly explained with your example how important adding this information to the model is, seems to me to mess up all the effort we put into word embedding to get similar words clustered together. The word pizza, instead of being positioned in the same place can now jump around word/positional embedding space. Instead of one representation of pizza in space, it can now move around to be in many different positions, and not move locally around its own 'area' but because we add the positional encoding to the word embedding, scaled equally, it can jump around a great deal of space. To me it would seem adding this much freedom to where the word pizza can be represented in space would make it much much harder to train the model. Is my understanding correct or is there something I’m missing?
@statquest
@statquest Жыл бұрын
I have a couple of thoughts on this. Maybe I should make a short video called "some thoughts about positional encoding". Anyway, here they are... Thought #1: Remember the positional encoding is fixed, so the word embedding values have to take them into account when training. For example, since all of the positional encoding value are between -1 and 1, it is possible that the word embedding values will have larger magnitudes and thus, not move around a lot when position is added to them. Thought #2: Because the periods of the squiggles get larger for larger embedding positions, after about the 20th position, the position encoding values end up alternating 1 and 0 (in other words, after the 20th position, the position encoding values are 1010101....) and it is in that space, from the 20th position to the 512th position (usually word embeddings have 512 or more positions) that the word embeddings are really learned, and that the first 20 positions are mostly just for position encoding.
@matthewhaythornthwaite9910
@matthewhaythornthwaite9910 Жыл бұрын
@@statquest Ah ok yeh that makes a lot of sense, thanks so much for taking the time to reply!
@matthewhaythornthwaite9910
@matthewhaythornthwaite9910 Жыл бұрын
I’ve been having some additional thoughts on this and think I may have another reason (or rather an example) why adding positional encoding to the word embedding vectors makes sense, Josh if you read this, feel free to shoot it down! Take the following sentence: “The weather is bad, but my mood is good”. In this sentence the first “is” refers to the weather, whereas the second "is" refers to my mood. Without positional encoding and only word embedding, the vector for “is” being passed into the attention unit will be the same for the two instances of the word in the sentence. If we don’t use masked self-attention and compare the word “is” to every word including itself in the sentence, then the output of the word “is” in the self-attention unit I believe should be the same for both instances. Therefore, the unit will struggle to successfully differentiate the relative meaning of the two words. By adding in positional encoding prior to the self-attention unit, we’re suddenly adding context to the word. The second “is” comes straight after the word “mood”, therefore the position vector we’re adding to each of the two words should be similar. However, because the word “weather” comes 6 words before the second “is”, the positional vector we add will be quite different. Presumably this difference helps a self-attention unit to differentiate the relative meanings of the two instances of the word “is”.
@statquest
@statquest Жыл бұрын
@@matthewhaythornthwaite9910 That all sounds reasonable to me! BAM! :)
@luckusters8568
@luckusters8568 8 ай бұрын
@@matthewhaythornthwaite9910 Another reason why you would want to add positional encoding instead of doing something else is that it preserves the dimensionality of the encoding. Imagine a theoretical encoding which is not added (like a one-hot encoding for each sequence location), and some linear (or non-linear for that matter) transform to combine word embedding and positinonal encoding. This is great in the sense that we do not polute the embedding space with "arbitrary" offsets, but now our input sequence has to be of a fixed shape. Addition of orthogonal sinusoids guarrantees a non-parametric, dimensionality preserving encoding which does not fix the number of inputs we can give to the network. By the way, I think there is an analogy between adding positional encoding to embeddings and adding residual/skip connections to network outputs. Imagine that we have a network that is represented by the function f(x) and we have some target function F(x) which we want the network to learn. Imagine now that we modify our network to compute the function f(x) = h(x) + x (where "h(x)" is the network in front of the skip connection "h(x) + x"). Here too we polute the output space of h(x) with the values of x. However the network f can still learn F, so long as the network h(x) learns the function h(x) = F(x) - x (such that f(x) = h(x) + x = F(x) - x + x = F(x)). I suppose for positional encoding something similar holds (altough it probably has to learn a much more difficult internal pattern), where the network is f(E(x)+q) learns to associate word embedding values E(x) which are "convolved" by some known offsets q and probably learns to deconvolve E(x) and q (into some abstract representation).Given that E(x) + q may in theory be (nearly) non-unique (i.e. E(x_1) + q_1 = approx E(x_2) + q_2) it might still be possible for the network to deconvolve the values into the correct inputs based on the context vector C which is calculable from the rest of the input sequence. I suppose one can't exclude that the network may never get this wrong, but in practical terms, it seems to work well enough.
@jamemamjame
@jamemamjame 6 ай бұрын
you are the unique ML teacher guy in the world, and I don't think anyone can explain this thing like you. Thank you myself for knowing your channel!
@statquest
@statquest 6 ай бұрын
Wow, thanks!
@amarug
@amarug 3 ай бұрын
I must say, after learning how GPT works I am even more blown away that this can work the way it works with the big online models. We just have to keep in mind that these companies are blowing tens of millions just for training. That's something else than me with a RTX 4090 and pytorch spending 12 cents of electricity and thinking I have unfathomable power 😂
@statquest
@statquest 3 ай бұрын
:)
@sauravchandra10
@sauravchandra10 5 ай бұрын
It is so complex, ill have to watch this 2 3 times. But you did much better than anyone else. Thanks.
@statquest
@statquest 5 ай бұрын
It can be made much simpler if you learn about neural networks first. You can do that with this play list: kzbin.info/www/bejne/eaKyl5xqZrGZetk
@apah
@apah Жыл бұрын
Man oh man the crazy timing .. I just watched your video on attention yesterday !! TRIPPLE BAAAAM your rock josh thanks :D
@statquest
@statquest Жыл бұрын
BAM! :)
@jordanmuniz6167
@jordanmuniz6167 7 ай бұрын
Your videos have to be the best instance of teaching I have ever seen! Thank you for the amazing work!
@statquest
@statquest 7 ай бұрын
Thank you!
@tdv8686
@tdv8686 Жыл бұрын
OMG, I waited for it for so long!!, thank you, Josh!
@statquest
@statquest Жыл бұрын
bam! :)
@vinny2688
@vinny2688 Жыл бұрын
THIS is what I've been waiting for!
@statquest
@statquest Жыл бұрын
Thanks!
@prathameshdinkar2966
@prathameshdinkar2966 11 ай бұрын
So nicely explained! I have searched for "how transformers work" but no one on youtube explained with both concept and math! Keep the good work going 😁👍
@statquest
@statquest 11 ай бұрын
Glad you liked it!
@williamflinchbaugh6478
@williamflinchbaugh6478 Жыл бұрын
Great video! I'd love to see a pytorch + lightning tutorial on transformers similar to the LSTM video!
@statquest
@statquest Жыл бұрын
That's the plan!
@torazis3286
@torazis3286 10 ай бұрын
I like how he says "In this example we kept things super simple". Great video, thank you!
@statquest
@statquest 10 ай бұрын
Glad you liked it!
@vladimirmihajlovic1504
@vladimirmihajlovic1504 Жыл бұрын
Hey @statquest - here is a quick suggestion. Another convenient way to explain positional encoding might be by drawing clock with minute and hour hand. Then - instead of sin() and cos() functions you could simply track the x and y coordinates of the tip of the minute and hour hand. It gives much more convenient intuition behind mechanics of the encoding. (a) it shows its repetitive nature (b) ties encoding position with sense of time (which is intuitive since speech is tied to time as well). Speech is the most common way we use language (c) it explains why we use both sin() and cos() functions (to track circular motion of the clock hand) (d) it provides intuition on why having two pair of sin() and cos() functions is better than just one
@statquest
@statquest Жыл бұрын
That's a great idea!
@Ali-Aslam
@Ali-Aslam Жыл бұрын
So kind of like a unit circle?
@vidbot4037
@vidbot4037 Жыл бұрын
HE HAS DONE IT YET AGAIN!
@statquest
@statquest Жыл бұрын
Thanks!
@mostafamarwanmostafa9975
@mostafamarwanmostafa9975 5 ай бұрын
Thank you sir for this amazing video, it helped me last year in my NLP exam and now i'm refreshing my information's about transformers hoping to land an interview soon!
@statquest
@statquest 5 ай бұрын
Good luck - let me know how it all goes.
@abdoualgerian5396
@abdoualgerian5396 Жыл бұрын
We wanna more NLP material please, tiny bam !
@statquest
@statquest Жыл бұрын
:)
@jessiondiwangan2591
@jessiondiwangan2591 Жыл бұрын
(Verse 1) Here we are with another quest, A journey through the world of stats, no less, Data sets in rows and columns rest, StatQuest, yeah, it's simply the best. (Chorus) We're diving deep, we're reaching wide, In the land of statistics, we confide, StatQuest, on a learning ride, With your wisdom, we abide. (Verse 2) From t-tests to regression trees, You make understanding these a breeze. Explaining variance and degrees, StatQuest, you got the keys. (Chorus) We're scaling heights, we're breaking ground, In your lessons, profound wisdom's found, StatQuest, with your sound, We'll solve the mysteries that surround. (Bridge) With bar charts, line plots, and bell curves, Through distributions, we observe, With every lesson, we absorb and serve, StatQuest, it's knowledge we preserve. (Chorus) We're traversing realms, we're touching sky, In the field of data, your guidance, we rely, StatQuest, with your learning tie, You're the statistical ally. (Outro) So here's to Josh Starmer, our guide, To the realm of stats, you provide, With StatQuest, on a high tide, In the world of statistics, we stride. (End) So get ready, set, quest on, In the realm of stats, dawn upon, StatQuest, till the fear's gone, Keep learning, till the break of dawn.
@statquest
@statquest Жыл бұрын
THAT IS AWESOME!!! (what are the chords?)
@technicalbranch99
@technicalbranch99 Жыл бұрын
@@statquest I V vi IV
@michaelcharlesthearchangel
@michaelcharlesthearchangel 8 ай бұрын
Bravo! Excellent teaching skills! Teaching weights and biases is not easy but, by God, man, you've done it!
@statquest
@statquest 8 ай бұрын
Thank you very much!
@patriciachang5079
@patriciachang5079 Жыл бұрын
You really explaining these concepts in a clear way! Will you do more explanation video on statistic like Cox model for survival ? Thanks! :)
@statquest
@statquest Жыл бұрын
I'll keep that in mind.
@quratulainshahnawazz
@quratulainshahnawazz 8 күн бұрын
Absolutely brilliant. These are very helpful for me understanding the concepts of Deep Learning.
@statquest
@statquest 8 күн бұрын
Thanks!
@shaktisd
@shaktisd 11 ай бұрын
One of the best explanation of encoder / decoder architecture. Esp. the self attention part. I really liked the way you colored Q,K, V to keep track of how things are moving . Looking forward to more such videos
@statquest
@statquest 11 ай бұрын
Thanks! I've also got a video on Decoder-Only Transformers: kzbin.info/www/bejne/mIKYc6Klob1sd8k and I'm working on one that shows the matrix algebra (color coded) of how these things are computed.
@shaktisd
@shaktisd 11 ай бұрын
@@statquest are all these topics covered in your book ? Would love to read them in printed format
@statquest
@statquest 11 ай бұрын
@@shaktisd They'll be in my next book.
@shaktisd
@shaktisd 11 ай бұрын
@@statquest looking forward to the next edition.
@tamoghnamaitra9901
@tamoghnamaitra9901 Жыл бұрын
As a Gen AI consultant, this video was amazing. Absolutely perfect
@statquest
@statquest Жыл бұрын
Thank you!
@alexeecs
@alexeecs Жыл бұрын
I never thought it would be possible to explain something so complicated in such detail without even assuming any calculus or linear algebra background, but I am glad to be proven wrong
@statquest
@statquest Жыл бұрын
Thank you!
@newbie8051
@newbie8051 Ай бұрын
Very thorough, thanks Utilizes a lot of the previous concepts of basics of Deep Learning, amazing resources you've compiled over the years. I've read abt transformers from multiple resources and each time I understanding something new ! Prepping for interviews rn, will thank you once again after I land a job 🌞💖
@statquest
@statquest Ай бұрын
good luck!
@irishchannel120
@irishchannel120 8 ай бұрын
These videos on machine learning (and statistics!) are incredible and empowering! Plus they make me laugh! You are the reason that I was able to keep my head above water in my Machine Learning courses as a pregnant grad student with limited time, money and energy. Thank you and I will definitely be checking out some merch!
@statquest
@statquest 8 ай бұрын
Thank you so much! I'm happy to hear my videos are useful. BAM! :)
@张雪薇-i9h
@张雪薇-i9h 8 ай бұрын
the best and simplest video to learn transformer ever!
@statquest
@statquest 8 ай бұрын
Thank you!
@fgh680
@fgh680 Жыл бұрын
The most AWESOME 36 MINUTES - What an explanation of Transformers!
@statquest
@statquest Жыл бұрын
Thank you very much!!! BAM! :)
@artmiss-x8o
@artmiss-x8o 2 ай бұрын
The most useful 35 minutes of my life !
@statquest
@statquest 2 ай бұрын
bam! :)
@pranav7471
@pranav7471 7 ай бұрын
A great explanation of Transformer, the one thing I found missing was the decoder has a masked self attention, to prevent future embedding from "leaking" into current output
@statquest
@statquest 7 ай бұрын
For an encoder-decoder transformer, masked self-attention is only used during training, which this video doesn't cover. However, I cover it in my video on Decoder-Only Transformers here: kzbin.info/www/bejne/mIKYc6Klob1sd8k
@vohiepthanh9692
@vohiepthanh9692 Жыл бұрын
Penta BAM!!! All of your videos are extremely easy to understand in a peculiar way, they have helped me a lot, thank you very much.
@statquest
@statquest Жыл бұрын
Glad you like them!
@rohittamidapati6506
@rohittamidapati6506 5 ай бұрын
Wow Josh. You’re a genius, this was comprehensive and complete. Thank you again.
@statquest
@statquest 5 ай бұрын
Glad it was helpful!
@rayankhan5025
@rayankhan5025 4 күн бұрын
Legacy teachers have A LOT to learn from you. Cheers!
@statquest
@statquest 4 күн бұрын
Thanks!
@anubhavexoticvehicles
@anubhavexoticvehicles 28 күн бұрын
You are there man! One place destination for deep learning concept! tysm
@statquest
@statquest 27 күн бұрын
Thank you!
@lolololo-cx4dp
@lolololo-cx4dp Жыл бұрын
This really clear my doubts about attention mechanisms.
@statquest
@statquest Жыл бұрын
bam!
@debayantalapatra2066
@debayantalapatra2066 Жыл бұрын
This is the best of all that is available right now on Transformers. Thank you!!
@statquest
@statquest Жыл бұрын
Thank you!
@syedmustahsan4888
@syedmustahsan4888 2 ай бұрын
And once again, I am thank ful to you for summing up my concepts. Alhumdulillah
@statquest
@statquest 2 ай бұрын
Thanks!
@vladimirbosinceanu5778
@vladimirbosinceanu5778 6 ай бұрын
Amazing as always. The internet is a better place because of humans like you. Thank you, Josh!
@statquest
@statquest 6 ай бұрын
Wow, thank you!
@JavierSanchez-yc8qo
@JavierSanchez-yc8qo 7 ай бұрын
@statquest you are a true professional and a master of your craft. The field of ML is getting a little stronger each day bc of content like this!
@statquest
@statquest 7 ай бұрын
Thank you very much!
@manuelapacheco9129
@manuelapacheco9129 Жыл бұрын
man i love you for this video thank you so much, there's absolutely no way i'd have understood all of this without your help
@statquest
@statquest Жыл бұрын
Glad I could help!
@AI_ML_DL_LLM
@AI_ML_DL_LLM Жыл бұрын
By far, this is the best explanation of the transformer I v ever seen. the only thing missing (maybe 5 min only) is how the training is being done.
@statquest
@statquest Жыл бұрын
Thanks! I ended up adding a bit about how to train a normal transformer at the end of my video on decoder-only transformers video: kzbin.info/www/bejne/mIKYc6Klob1sd8ksi=b48XvWNZuJ7lvKtq&t=2033
@erikleichtenberg3950
@erikleichtenberg3950 9 ай бұрын
1 million subscribers and still taking the time to answer questions from his viewers. Absolute legend
@statquest
@statquest 9 ай бұрын
BAM! :)
@luisfernando5998
@luisfernando5998 9 ай бұрын
Bet it’s an AI bot answering 🤖
@statquest
@statquest 9 ай бұрын
@@luisfernando5998 Nope - it's me. I really read all the comments and respond to as many as I can.
@luisfernando5998
@luisfernando5998 9 ай бұрын
@@statquest do u have a team ? 🤔 how do u manage the time ? 🤯
@statquest
@statquest 9 ай бұрын
@@luisfernando5998 It only takes about 30 minutes a day. It's not that big of a deal.
@cezarystorczyk1722
@cezarystorczyk1722 Жыл бұрын
Man, you are best teacher i have ever listen to. Thanks
@cezarystorczyk1722
@cezarystorczyk1722 Жыл бұрын
It is first time i giving thumb up in every video in the playlist
@statquest
@statquest Жыл бұрын
Thank you very much! :)
@fatemehghanadi3046
@fatemehghanadi3046 7 ай бұрын
U explained it really clear. It was the best transformer video i've watched.
@statquest
@statquest 7 ай бұрын
Wow, thanks!
@ahmedhossam8741
@ahmedhossam8741 5 ай бұрын
Amazing explanation throughout the whole series, thank you for everything.
@statquest
@statquest 5 ай бұрын
Thank you!
@supernenechi
@supernenechi Жыл бұрын
Oh my god! Now it makes sense why training a transformer model takes FAR more compute than it does to run them! I can run them on my CPU, but training a big one may take weeks to months on entire GPU clusters! So many values to optimize and SO many nodes, inputs and outputs! That's insane! And it also shows why the models are SO large! There are SO many weights and biases, SO many connections and an entire world of vocabulary to store in it and connect up! This is SO impressive!
@statquest
@statquest Жыл бұрын
BAM! :)
@saumaydudeja7423
@saumaydudeja7423 Жыл бұрын
This is probably the most awesome video about transformers ever
@statquest
@statquest Жыл бұрын
Thank you! :)
@adithyakumar1111
@adithyakumar1111 Жыл бұрын
Thank you Josh for this fantastic video. One of the best videos to explain the math behind the Query, Key and Values.
@statquest
@statquest Жыл бұрын
Thank you!
@AbhishekSinghSambyal
@AbhishekSinghSambyal 10 ай бұрын
Dot product can be so powerful. Transformers are beautiful. Thank you for this amazing video.
@statquest
@statquest 9 ай бұрын
Thanks!
@ruicai9084
@ruicai9084 Жыл бұрын
I feel so lucky that I just started learning Transformer and found out StatQuest made a video for it one day ago!
@statquest
@statquest Жыл бұрын
bam!
@Isakilll
@Isakilll Жыл бұрын
Just wanted to say that I understood everything about LMs (thanks to your videos), except the part on transformers cuz the video wasn't out yet ahah. Well now that my dear squash teacher explained it, everything's clear. So really THANK YOU for your hard work and dedication, it made all the difference in my understanding of Neural Networks in general
@statquest
@statquest Жыл бұрын
Great to hear!
@dy8576
@dy8576 6 ай бұрын
I keep coming back to this video, every time i forget about the inner workings, and its always as easy to regather everything. What content!
@statquest
@statquest 6 ай бұрын
Glad it's helpful! :)
@mugssyy
@mugssyy 4 ай бұрын
BAM ! Thank you Josh. This entire playlist is SUPER. 😊
@statquest
@statquest 4 ай бұрын
Glad you like it!
@dianaayt
@dianaayt 5 ай бұрын
(just letting you know while cramming for my exam, ive used your videos for over 4 years now and some professors, including the ones from this class, use your videos for us to learn (either showing them to us or using screenshots from the videos with reference). Its so funny watching the reaction of people who arent familiar with your channel reacting to a soft cute bear as the soft max function and such haha Most people in portugal can speak english but still, many prefer to search in portuguese, so they never encountered your channel before. But I make sure to let all my friends know your content that helps us with some specific exam. I would never be able to thank you enough for the more than wonderful content and for your patience explaining questions we have in the comments)
@statquest
@statquest 5 ай бұрын
Thank you very much! I'm glad to hear that my videos are helpful. Maybe one day I can go to Portugal and teach a lecture in person. That would be super fun. :)
@hamidrezahosseinkhani5980
@hamidrezahosseinkhani5980 Жыл бұрын
It was incredible. step-by-step, clear and concise, detailed enough. great great. thank you for such an amazing video!
@statquest
@statquest Жыл бұрын
Glad you enjoyed it!
Decoder-Only Transformers, ChatGPTs specific Transformer, Clearly Explained!!!
36:45
StatQuest with Josh Starmer
Рет қаралды 135 М.
Word Embedding and Word2Vec, Clearly Explained!!!
16:12
StatQuest with Josh Starmer
Рет қаралды 338 М.
Why no RONALDO?! 🤔⚽️
00:28
Celine Dept
Рет қаралды 57 МЛН
Disrespect or Respect 💔❤️
00:27
Thiago Productions
Рет қаралды 43 МЛН
Attention in transformers, visually explained | DL6
26:10
3Blue1Brown
Рет қаралды 1,8 МЛН
MIT Introduction to Deep Learning | 6.S191
1:09:58
Alexander Amini
Рет қаралды 736 М.
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 1,3 МЛН
Recurrent Neural Networks (RNNs), Clearly Explained!!!
16:37
StatQuest with Josh Starmer
Рет қаралды 596 М.
Let's build GPT: from scratch, in code, spelled out.
1:56:20
Andrej Karpathy
Рет қаралды 4,9 МЛН
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 520 М.
Transformers (how LLMs work) explained visually | DL5
27:14
3Blue1Brown
Рет қаралды 3,7 МЛН
Why Unreal Engine 5.5 is a BIG Deal
12:11
Unreal Sensei
Рет қаралды 1,1 МЛН
Large Language Models explained briefly
8:48
3Blue1Brown
Рет қаралды 451 М.
Why Does Diffusion Work Better than Auto-Regression?
20:18
Algorithmic Simplicity
Рет қаралды 378 М.
Why no RONALDO?! 🤔⚽️
00:28
Celine Dept
Рет қаралды 57 МЛН