- You can get the code here: github.com/StatQuest/decoder_transformer_from_scratch - Learn more about GiveInternet.org: giveinternet.org/StatQuest NOTE: Donations up to $30 will be matched by an Angel Investor - so a $30 donation would give $60 to the organization. DOUBLE BAM!!! - The full Neural Networks playlist, from the basics to AI, is here: kzbin.info/www/bejne/eaKyl5xqZrGZetk - Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@thebirdhasbeencharged5 ай бұрын
Can't imagine the work that goes into this, writing the code, making diagrams, recording, editing and voice over, you're the goat big J.
@statquest5 ай бұрын
Thanks!
@thomasalderson3685 ай бұрын
he is well compensated
@statquest5 ай бұрын
@@thomasalderson368 am I? Maybe it's relative, but hour for hour I'm making significantly less than I did doing data analysis in a lab.
@FindEdge5 ай бұрын
@@statquest Sir we Love you and your work, please don't let such comments to your heart! You may never meet us but there is a generation of statisticians and Data Scientists who owe a lot to you may be all of it!
@statquest5 ай бұрын
@@FindEdge Thanks!
@gvlokeshkumar4 күн бұрын
Dear Josh, Thank you so much for this amazing work! I followed the videos from your channel in the below order and it helped me a lot. 1) Seq to Seq 2) Attention for neural networks 3) Transformer neural networks 4)Decoder only transformers 5)Encoder only transformers 6)Coding a Chatgpt like transformer from this scratch in Pytorch This should be a playlist in every ML/AL enthusiast's library. Waiting for your book on neural networks! Hope it releases in India.
@statquest4 күн бұрын
Awesome! That's the order I have them in my playlist for neural networks and AI: kzbin.info/www/bejne/eaKyl5xqZrGZetk
@techproductowner5 ай бұрын
You will be rememberd for next 1000 years in the history of Statistics and Data Science , You should be named as "Father of Applied Statistics & Machine Learning " , Pls thumbs up if you are with me
@statquest5 ай бұрын
BAM! :)
@hewas3215 ай бұрын
Hey Josh, you know what? I used to watch your videos explaining the key ingredients of statistics EVERY DAY in 2020~2021 when I was a freshman. Whatever I click among your videos, it was always the first time for me to learn it. I knew nothing. But I still remember what concept you dealt with in videos and how you explained them. Fortunately now I work as an AI researcher - it's been a year already - although I am a 3rd grade student. You suddenly came to my mind so I've just taken a look at your channel for the first time in a long time. This time I've already knew about all of what you explain in videos. It feels really weird. Everything is all thanks to you and still your explanations are clear, well-visualized and awesome. You are such a big help to the newbies of statistics and machine/deep learning. Always love your works. Please keep it going!!! 🔥
@statquest5 ай бұрын
Thank you very much! I'm so happy that my videos were helpful for you. BAM! :)
@jahanzebnaeem25255 ай бұрын
HUGE RESPECT for all the work you put into your videos
@statquest5 ай бұрын
Thank you!
@Op1czak5 ай бұрын
Josh, I want to express my sincerest gratitude. I have been following your videos for years and they have been becoming increasingly more important for my study and career path. You are a hero.
@statquest5 ай бұрын
Thank you! :)
@yashagrahari23 күн бұрын
Please continue this playlist! we miss you. 😓
@statquest22 күн бұрын
Thanks! I just released a new neural networks video, but it doesn't have coding. That said, my book on neural networks comes out early January and has coding tutorials for every major concept.
@TalkOfWang5 ай бұрын
It is party time! Thanks for uploading!
@statquest5 ай бұрын
You bet!
@n.h.son19025 ай бұрын
You said this was going to come out at the end of May. And I’ve been waiting for this for 2 months. Finally, it’s out 😂
@statquest5 ай бұрын
I guess better later than never?
@ramwisc15 ай бұрын
Wow - have been waiting for this one! Now that I've wrapped my head around word embeddings, time to code this one up! Thank you @statquest!
@statquest5 ай бұрын
Bam! :)
@muhammadikram3755 ай бұрын
sir you deserved millions of views on your KZbin ❤❤🎉
@statquest5 ай бұрын
Thanks!
@bayoudata5 ай бұрын
Cool, learn a lot from all of your videos Josh! 🤯
@statquest5 ай бұрын
Thanks!
@shreyashsahare17279 күн бұрын
Not this legend saving my life again! (PS: it's my finals week). Lots of love!
@statquest9 күн бұрын
Good luck!
@Simon-FriedrichBöttger5 ай бұрын
Thank you very much!
@statquest5 ай бұрын
TRIPLE BAM!!! Thank you so much for supporting StatQuest!!!
@akshaygs40485 ай бұрын
It had been sometime since i watched your video. Very good video as always 🎉🎉
@statquest5 ай бұрын
Thanks! 😃
@abhinavsb92284 ай бұрын
100/100 🔥when i search for an explanation video on youtube this is what i expect🔥
@statquest4 ай бұрын
Thanks!
@gstiebler5 ай бұрын
Thanks!
@statquest5 ай бұрын
TRIPLE BAM!!! Thank you for supporting StatQuest!
@pro100gameryt85 ай бұрын
Incredible video, Josh! Love your content. Can you please make a video on diffusion models?
@statquest5 ай бұрын
I'll keep that in mind.
@pro100gameryt85 ай бұрын
Thank you very much Josh! Bam @statquest
@jawadmansoor60645 ай бұрын
Finally greatly watied video arrived. Thank you.
@statquest5 ай бұрын
Bam! :)
@Brad-qw1te5 ай бұрын
I’ve been trying to make a Neural Network in c++ for like a month now. I was trying to just use 3b1b’s videos but they wernt good enough. But then I found your videos and I’m getting really close to being able to finish the back propagation algorithm. When I started I thought it would look good on my resume but now I’m thinking nobody will care but I’m in too deep to quit
@statquest5 ай бұрын
good luck!
@sillypoint22925 ай бұрын
This video's amazing man. Not just this one but every video of yours. Before I began actually learning Machine Learning I used to watch your videos jus for fun and trust me, it had taught me a lot. Thanks for your amazing teaching :) with love from India ❤
@statquest5 ай бұрын
Great to hear!
@sillypoint22925 ай бұрын
@@statquest :)
@SaftigKnackig2 ай бұрын
I could only watch your videos for getting cheered up by your intro song.
@statquest2 ай бұрын
bam! :)
@Sravdar4 ай бұрын
AMAZING VIDEOS. Watched all of your nn playlist in 3 days. And now reaching the end i have some questions. One is what are the future planned videos? And two is how do you select activation functions? In fact a video where you create custom models for for different problems and explaining "why to use this" would be great. No need to explain math or programing needed for that. Thank you for all of these videos!
@statquest4 ай бұрын
Thanks! I'm glad you like the videos. My guess is the next one will be about encoder-only transformers. I'm also working on a book about neural networks that includes all the content from the videos plus a few bonus things. I've finished the first draft and will start editing it soon.
@hammry_pommter3 ай бұрын
sir first of all huge respect to your content......Sir one more request can u make one video on how to apply transformer on image datasets for different image processing models....like object detection,segmentation.... but only thing is teachers like u make this world more beautiful....
@statquest3 ай бұрын
Thanks! I'll keep those topics in mind.
@gvascons5 ай бұрын
Great and very didactic as usual, Josh!! Definitely going to wrap my head around this for a while and try a few tweaks! Do you plan on eventually also discussing other non-NLP topics like GANs and Diffusion Models?
@statquest5 ай бұрын
One day I hope to.
@neonipun5 ай бұрын
I'm gonna enjoy this one!
@statquest5 ай бұрын
bam! :)
@glaudiston5 ай бұрын
Today we learned that statquest is awesome. triple BAM!
@statquest5 ай бұрын
Thanks!
@sikandarnadaf7858Ай бұрын
Thanks for making it so easy to understand
@statquestАй бұрын
You're welcome!
@elifiremarslan9408Ай бұрын
Great video! I like the way you teach!
@statquestАй бұрын
Thanks!
@hasibahmad2975 ай бұрын
I saw the title and right away knew that it is BAM. Can we expect some data analysis, ML projects from scratch?
@statquest5 ай бұрын
I hope so.
@jorgesanabria64845 ай бұрын
This will be awesome. I am trying to learn the math behind transformers and PyTorch so hopefully this helps give me some intuition
@statquest5 ай бұрын
I've got a video all about the math behind transformers here: kzbin.info/www/bejne/gaHLnoKAo7F0mqs
@ShadArfMohammed5 ай бұрын
as always, wonderful content. Thanks :)
@statquest5 ай бұрын
Thanks again!
@205-cssaurabhmaulekhi95 ай бұрын
Thank you I was in need of this 😊
@statquest5 ай бұрын
Glad it was helpful!
@alexsemchenkov5740Ай бұрын
Great job! Thanks a million!
@statquestАй бұрын
Thanks!
@pompymandislian5628Ай бұрын
so briliant, please create video scratch more again, i so like it thankyouu
@statquestАй бұрын
Thanks! Will do!
@sinanrobillard281924 күн бұрын
Thank you very much for this!!!
@statquest24 күн бұрын
Thanks!
@Pqrsaw3 ай бұрын
Loved it! Thank you very much
@statquest3 ай бұрын
Thank you!
@datasciencepassions45225 ай бұрын
God Bless You for the great work you do! Thank you so much
@statquest5 ай бұрын
Thank you very much! :)
@gigabytechanz9646Ай бұрын
Really helpful! Thanks
@statquestАй бұрын
Glad it was helpful!
@Sikandar456Ай бұрын
Hi Josh, this video really helped. Can you do one on diffusion models?
@statquestАй бұрын
I'll keep that in mind.
@sharjeel_mazhar5 ай бұрын
Thank you! You're the best!!!
@statquest5 ай бұрын
You're welcome!
@iqra22913 ай бұрын
Amazing explanation 🎉❤ you are the best 😊
@statquest3 ай бұрын
Thank you! 😃
@1msiriusАй бұрын
I really like your teaching
@statquestАй бұрын
Thank you!
@1msiriusАй бұрын
@@statquest I should thank you sir! I love watching your videos!
@sidnath73365 ай бұрын
Awesome video! Maybe we can have a part 2 where we incorporate multi-head attention? 👌🏽 And then could make this a series on different decoder models and how they differ e.g., mistral uses RoPE and sliding window attention etc…
@statquest5 ай бұрын
If you look at the code you'll see how to to create multi-headed attention: github.com/StatQuest/decoder_transformer_from_scratch
@旭哥-r5b2 ай бұрын
Thank you. You're a lifesaver when I need this to finish my school project. However, if the input contains a various number of strings, do I add padding after ?
@statquest2 ай бұрын
Yes, you do that when training a batch of inputs with different lengths.
@旭哥-r5b2 ай бұрын
@@statquest Thank you for your help. However, if I use zero padding and include zero as a valid token in the vocabulary, won't the model end up predicting zero-which is meant to represent padding-thereby making the output meaningless?
@statquest2 ай бұрын
@@旭哥-r5b You create a special token for padding.
@旭哥-r5b2 ай бұрын
@@statquest And that token will still be used as the label for training?
@statquest2 ай бұрын
@@旭哥-r5b I believe that is is correct.
@artofwrick3 ай бұрын
Hey... Josh, can you please make a Playlist on all the videos on probability that you've posted so far??? Please ❤❤
@statquest3 ай бұрын
I'll keep that in mind, in the mean time, you can go through the Statistics Fundaments in this list: statquest.org/video-index/
@miriamramstudio39825 ай бұрын
Great video. Thanks
@statquest5 ай бұрын
Glad you liked it!
@mikinyaa5 ай бұрын
🎉🎉🎉thank you😊
@statquest5 ай бұрын
bam! :)
@cuckoo_is_singing5 ай бұрын
hi josh, should embedding weigths be updated during training? for example nn.embedding(vocab_size,d_model) produces random numbers that each token will be referred to the related rows in our embedding matrice, should we update this weights during training? positional embedding weights are constant during our training and the only weights (except other parameters of course, like q,k,v) that prone to change are our nn.embedding weights! I wrote a code for translating amino acids to sequences everything in training works well with accuracy 95-98% but in inference stage I get to the bad results. i recall my model by loading_path=os.path.join(checkpoint_dir, config['model_name']) model.load_checkpoint(loading_path,model,optimizer) but after inference loop my result is like: 'tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc ' :( even we assume my algorithm has overfitted We shouldn't get to this result! also I think other parameters like dropout factor should not be considered in inference stage (p=0 for dropout) I mean we shouldn't just reload the best parameters, we should change some parameters (srry I spoke alot :)) )
@statquest5 ай бұрын
The word embedding weights are updated during training.
@codinghighlightswithsadra73433 ай бұрын
thank you ! can you please explane how we can use transformer in time series please?
@statquest3 ай бұрын
I'll keep that in mind. But in the mean time, you can thank of an input prompt (like "what is statquest?") as a time series dataset - because the words are ordered and occur sequentially. So, based on a sequence of ordered sequence of tokens, the transformer generates a prediction about what happens next.
@mohamedthasneem73273 ай бұрын
Thank you very much sir...💚
@statquest3 ай бұрын
Thanks!
@Faisal-cl9iu5 ай бұрын
Thanks a lot for for this free wonderful content. ❤😊
@statquest5 ай бұрын
Thank you!
@PadaiLikhai-hu6opАй бұрын
never stop making videos, or else i'll track you down and make you eat very spicy chillies
@statquestАй бұрын
bam! :)
@Priyanshuc24254 ай бұрын
Please include this in your happy halloween playlist
@statquest4 ай бұрын
Thanks! Will do! :)
@Priyanshuc24254 ай бұрын
@@statquest triple bam :)
@__no_name__5 ай бұрын
I want to make a sequence prediction model. How should i test the model? What can i use for inference/ testing? (Not for natural language)
@statquest5 ай бұрын
I'm pretty sure you can do it just like shown in this video, just swap out the words for the tokens in your sequence.
@thomasalderson3685 ай бұрын
How about an encoder only classifier to round off the series? thanks
@statquest5 ай бұрын
I'll keep that in mind.
@danielhernandezmota2252 ай бұрын
I see that the two inputs have the same lenght... what would change if I wanted to train with another phrase, for instance: "What awesome statquest" (uses 4 tokens instead of 5). How can I generate an input with torch.tensor where the input is no longer the same dimension?
@statquest2 ай бұрын
It depends. If you want to train everything in a batch, all at once, you can add a "" token and mask that out when calculating attention.
@Mạnhfefe5 ай бұрын
thank you sm fr bro
@statquest5 ай бұрын
Any time!
@zeroonetwothree12985 ай бұрын
Legend.
@statquest5 ай бұрын
:)
@kadirbasol8210 күн бұрын
What about RWKV's ? can you explain them later ? they seem so lightweight
@statquest10 күн бұрын
I'll keep that topic in mind.
@gayedemiray5 ай бұрын
you are the best!!! hooray!!!! 😊
@statquest5 ай бұрын
Thanks!
@arnabmishra8275 ай бұрын
What is that extra "import" at line 2, @1.37
@statquest5 ай бұрын
That's called a typo.
@aadijha142 ай бұрын
reply with :) if you are think statquest is fully hydrated while recording these
@aadijha142 ай бұрын
really excited for the book btw
@statquest2 ай бұрын
bam! :)
@acasualviewer58615 ай бұрын
I'm confused as to why the values would come from the ENCODER when computing the cross attention between the Encoder and Decoder. Shouldn't the values come from the decoder itself? So if I trained a model to translate from English to German, then wanted to switch out the German for Spanish, I'd expect the new decoder to know what to do with the output of the Encoder. But if the values are coming from the Encoder, then this wouldn't work.
@statquest5 ай бұрын
The idea is that the query in the decoder is used to determine how a potential word in the output is related to the words in the input. This done by using a query from the decoder and keys for all of the input words in the encoder. Then, once we have established how much (what percentages) a potential word in the output is related to all of the input word, we then have to determine what that percentage is of. It is of the values. And thus, the values have to come from the encoder. For more details, see: kzbin.info/www/bejne/sKm0qoeBbdaor7s
@kimjong-un4521Ай бұрын
Finally completed. Took 1.5 months. God i am so slow
@statquestАй бұрын
BAM! :) It took me over 4 years to make the videos, so 1.5 months isn't bad.
@gastonmorixe5 ай бұрын
gold
@statquest5 ай бұрын
Thanks!
@frommarkham4243 ай бұрын
Optimus prime has been real quiet since this one dropped😬😬😬😬😬
@statquest3 ай бұрын
:)
@rishabhsoni5 ай бұрын
Respect
@statquest5 ай бұрын
Thanks!
@kratos_gow61010 күн бұрын
13:07 / 31:10 Coding Position Encoding As of now.
@observor-ds3ro5 ай бұрын
22:50 hey Josh you assigned 4 for number of tokens, but we have 5 tokens (including ) , even in the shape of the diagram, as you are pointing, there are 5 boxes (representing 5 outputs).. I got confused And you know what? Words fail me to say how much you affected on my life.. so I won’t say anything 😂
@statquest5 ай бұрын
See 26:46 . At 22:50 we just assign a default value for that parameter, however, we don't use that default value when we create the transformer object at 26:46. Instead, we set it to the number of tokens in the vocabulary.
@BooleanDisorder5 ай бұрын
I have imported a torch. Do I light it now?
@statquest5 ай бұрын
:)
@jayjhaveri1906Ай бұрын
love youuu
@statquestАй бұрын
:)
@TheFunofMusic5 ай бұрын
Triple Bam!!!
@statquest5 ай бұрын
:)
@tismanasou5 ай бұрын
Let's start from the basics. ChatGPT is not a transformer. It's an application.
@statquest5 ай бұрын
Yep, that's correct.
@keeperofthelight96815 ай бұрын
Sir can you include how to make the chatbot to hold a conversation with
@statquest5 ай бұрын
I'll keep that in mind.
@HanqiXiao-x1u5 ай бұрын
Horray!
@statquest5 ай бұрын
:)
@yosimadsu21895 ай бұрын
🙏🏻🙏🏻🙏🏻🙏🏻🙏🏻 Please please please show us how to train QVK Weights in detail 🙏🏻🙏🏻🙏🏻🙏🏻🙏🏻 You showed us just a simple call to function. But we are curious how it did the math, what to train, and how it can changes values of the weights. ABC
@statquest5 ай бұрын
Every single weight and bias in a neural network is trained with backpropagation. To learn more about how this process works, see: kzbin.info/www/bejne/f3-ViaB4na5_qpY kzbin.info/www/bejne/n6rRY62adrGcn5o and kzbin.info/www/bejne/fXy9oIJ-jayWgtE
@yosimadsu21895 ай бұрын
@@statquest Since both QVK Weights are splitted and the calculations are passing non neural network, imho the back propagation process is quite tricky. In the other hand, the fit function did not tell the order of calculations on each nodes.
@Bartosz-o4p5 ай бұрын
Bam! Peanut Butter and Jaaam ;)
@statquest5 ай бұрын
:)
@nossonweissman5 ай бұрын
BAM!!
@statquest5 ай бұрын
Thanks Nosson!
@kratos_gow61010 күн бұрын
14:39 / 31:10 Coding Attention As of now
@paslaid5 ай бұрын
🎉
@statquest5 ай бұрын
:)
@suika64595 ай бұрын
amazinggg
@statquest5 ай бұрын
Thanks!
@zendr05 ай бұрын
Bam!
@statquest5 ай бұрын
:)
@mousquetaire865 ай бұрын
Wish you could be Prime Minister of the United Kingdom!
@statquest5 ай бұрын
Ha! :)
@أحمدأكرمعامر5 ай бұрын
Baaaam!❤
@statquest5 ай бұрын
:)
@김정헌-i8r5 ай бұрын
GTP :)
@statquest5 ай бұрын
Corrected! ;)
@Melle-sq4df5 ай бұрын
in the very first slide the imports are broken at kzbin.info/www/bejne/eWq0hKOiatOgqLs `import torch.nn as nn import` # there's an extra trailing import here.
@statquest5 ай бұрын
Yep, that's a typo. That's why it's best to download the code. Here's the link: github.com/StatQuest/decoder_transformer_from_scratch
@gustavojuantorena5 ай бұрын
🎉🎉🎉
@statquest5 ай бұрын
Triple 🎉!
@louislim23162 ай бұрын
Triple Bam :)
@statquest2 ай бұрын
:)
@naromsky5 ай бұрын
From scratch in pytorch, huh.
@statquest5 ай бұрын
I decided to skip doing it in assembly. ;)
@ckq5 ай бұрын
GTP
@statquest5 ай бұрын
Corrected! :)
@isaacsalzman5 ай бұрын
Ya misspelled ChatGPT - Generative Pre-trained Transformer
@statquest5 ай бұрын
Corrected! :)
@lamlamnguyen70935 ай бұрын
Damnn bro 😮😮😮😮
@statquest5 ай бұрын
:)
@frommarkham4243 ай бұрын
ARTIFICIAL NEURAL NETWORKS ARE AWESOMEEEEEEEEEE🔥🔥🔥🔥🦾🦾🦾🗣🗣🗣🗣💯💯💯💯