Coding a ChatGPT Like Transformer From Scratch in PyTorch

Рет қаралды 55,536

Күн бұрын

Пікірлер: 213

@statquest 5 ай бұрын

- You can get the code here: github.com/StatQuest/decoder_transformer_from_scratch - Learn more about GiveInternet.org: giveinternet.org/StatQuest NOTE: Donations up to $30 will be matched by an Angel Investor - so a $30 donation would give $60 to the organization. DOUBLE BAM!!! - The full Neural Networks playlist, from the basics to AI, is here: kzbin.info/www/bejne/eaKyl5xqZrGZetk - Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

@thebirdhasbeencharged 5 ай бұрын

Can't imagine the work that goes into this, writing the code, making diagrams, recording, editing and voice over, you're the goat big J.

@statquest 5 ай бұрын

Thanks!

@thomasalderson368 5 ай бұрын

he is well compensated

@statquest 5 ай бұрын

@@thomasalderson368 am I? Maybe it's relative, but hour for hour I'm making significantly less than I did doing data analysis in a lab.

@FindEdge 5 ай бұрын

@@statquest Sir we Love you and your work, please don't let such comments to your heart! You may never meet us but there is a generation of statisticians and Data Scientists who owe a lot to you may be all of it!

@statquest 5 ай бұрын

@@FindEdge Thanks!

@gvlokeshkumar 4 күн бұрын

Dear Josh, Thank you so much for this amazing work! I followed the videos from your channel in the below order and it helped me a lot. 1) Seq to Seq 2) Attention for neural networks 3) Transformer neural networks 4)Decoder only transformers 5)Encoder only transformers 6)Coding a Chatgpt like transformer from this scratch in Pytorch This should be a playlist in every ML/AL enthusiast's library. Waiting for your book on neural networks! Hope it releases in India.

@statquest 4 күн бұрын

Awesome! That's the order I have them in my playlist for neural networks and AI: kzbin.info/www/bejne/eaKyl5xqZrGZetk

@techproductowner 5 ай бұрын

You will be rememberd for next 1000 years in the history of Statistics and Data Science , You should be named as "Father of Applied Statistics & Machine Learning " , Pls thumbs up if you are with me

@statquest 5 ай бұрын

BAM! :)

@hewas321 5 ай бұрын

Hey Josh, you know what? I used to watch your videos explaining the key ingredients of statistics EVERY DAY in 2020~2021 when I was a freshman. Whatever I click among your videos, it was always the first time for me to learn it. I knew nothing. But I still remember what concept you dealt with in videos and how you explained them. Fortunately now I work as an AI researcher - it's been a year already - although I am a 3rd grade student. You suddenly came to my mind so I've just taken a look at your channel for the first time in a long time. This time I've already knew about all of what you explain in videos. It feels really weird. Everything is all thanks to you and still your explanations are clear, well-visualized and awesome. You are such a big help to the newbies of statistics and machine/deep learning. Always love your works. Please keep it going!!! 🔥

@statquest 5 ай бұрын

Thank you very much! I'm so happy that my videos were helpful for you. BAM! :)

@jahanzebnaeem2525 5 ай бұрын

HUGE RESPECT for all the work you put into your videos

@statquest 5 ай бұрын

Thank you!

@Op1czak 5 ай бұрын

Josh, I want to express my sincerest gratitude. I have been following your videos for years and they have been becoming increasingly more important for my study and career path. You are a hero.

@statquest 5 ай бұрын

Thank you! :)

@yashagrahari 23 күн бұрын

Please continue this playlist! we miss you. 😓

@statquest 22 күн бұрын

Thanks! I just released a new neural networks video, but it doesn't have coding. That said, my book on neural networks comes out early January and has coding tutorials for every major concept.

@TalkOfWang 5 ай бұрын

It is party time! Thanks for uploading!

@statquest 5 ай бұрын

You bet!

@n.h.son1902 5 ай бұрын

You said this was going to come out at the end of May. And I’ve been waiting for this for 2 months. Finally, it’s out 😂

@statquest 5 ай бұрын

I guess better later than never?

@ramwisc1 5 ай бұрын

Wow - have been waiting for this one! Now that I've wrapped my head around word embeddings, time to code this one up! Thank you @statquest!

@statquest 5 ай бұрын

Bam! :)

@muhammadikram375 5 ай бұрын

sir you deserved millions of views on your KZbin ❤❤🎉

@statquest 5 ай бұрын

Thanks!

@bayoudata 5 ай бұрын

Cool, learn a lot from all of your videos Josh! 🤯

@statquest 5 ай бұрын

Thanks!

@shreyashsahare1727 9 күн бұрын

Not this legend saving my life again! (PS: it's my finals week). Lots of love!

@statquest 9 күн бұрын

Good luck!

@Simon-FriedrichBöttger 5 ай бұрын

Thank you very much!

@statquest 5 ай бұрын

TRIPLE BAM!!! Thank you so much for supporting StatQuest!!!

@akshaygs4048 5 ай бұрын

It had been sometime since i watched your video. Very good video as always 🎉🎉

@statquest 5 ай бұрын

Thanks! 😃

@abhinavsb9228 4 ай бұрын

100/100 🔥when i search for an explanation video on youtube this is what i expect🔥

@statquest 4 ай бұрын

Thanks!

@gstiebler 5 ай бұрын

Thanks!

@statquest 5 ай бұрын

TRIPLE BAM!!! Thank you for supporting StatQuest!

@pro100gameryt8 5 ай бұрын

Incredible video, Josh! Love your content. Can you please make a video on diffusion models?

@statquest 5 ай бұрын

I'll keep that in mind.

@pro100gameryt8 5 ай бұрын

Thank you very much Josh! Bam @statquest

@jawadmansoor6064 5 ай бұрын

Finally greatly watied video arrived. Thank you.

@statquest 5 ай бұрын

Bam! :)

@Brad-qw1te 5 ай бұрын

I’ve been trying to make a Neural Network in c++ for like a month now. I was trying to just use 3b1b’s videos but they wernt good enough. But then I found your videos and I’m getting really close to being able to finish the back propagation algorithm. When I started I thought it would look good on my resume but now I’m thinking nobody will care but I’m in too deep to quit

@statquest 5 ай бұрын

good luck!

@sillypoint2292 5 ай бұрын

This video's amazing man. Not just this one but every video of yours. Before I began actually learning Machine Learning I used to watch your videos jus for fun and trust me, it had taught me a lot. Thanks for your amazing teaching :) with love from India ❤

@statquest 5 ай бұрын

Great to hear!

@sillypoint2292 5 ай бұрын

@@statquest :)

@SaftigKnackig 2 ай бұрын

I could only watch your videos for getting cheered up by your intro song.

@statquest 2 ай бұрын

bam! :)

@Sravdar 4 ай бұрын

AMAZING VIDEOS. Watched all of your nn playlist in 3 days. And now reaching the end i have some questions. One is what are the future planned videos? And two is how do you select activation functions? In fact a video where you create custom models for for different problems and explaining "why to use this" would be great. No need to explain math or programing needed for that. Thank you for all of these videos!

@statquest 4 ай бұрын

Thanks! I'm glad you like the videos. My guess is the next one will be about encoder-only transformers. I'm also working on a book about neural networks that includes all the content from the videos plus a few bonus things. I've finished the first draft and will start editing it soon.

@hammry_pommter 3 ай бұрын

sir first of all huge respect to your content......Sir one more request can u make one video on how to apply transformer on image datasets for different image processing models....like object detection,segmentation.... but only thing is teachers like u make this world more beautiful....

@statquest 3 ай бұрын

Thanks! I'll keep those topics in mind.

@gvascons 5 ай бұрын

Great and very didactic as usual, Josh!! Definitely going to wrap my head around this for a while and try a few tweaks! Do you plan on eventually also discussing other non-NLP topics like GANs and Diffusion Models?

@statquest 5 ай бұрын

One day I hope to.

@neonipun 5 ай бұрын

I'm gonna enjoy this one!

@statquest 5 ай бұрын

bam! :)

@glaudiston 5 ай бұрын

Today we learned that statquest is awesome. triple BAM!

@statquest 5 ай бұрын

Thanks!

@sikandarnadaf7858 Ай бұрын

Thanks for making it so easy to understand

@statquest Ай бұрын

You're welcome!

@elifiremarslan9408 Ай бұрын

Great video! I like the way you teach!

@statquest Ай бұрын

Thanks!

@hasibahmad297 5 ай бұрын

I saw the title and right away knew that it is BAM. Can we expect some data analysis, ML projects from scratch?

@statquest 5 ай бұрын

I hope so.

@jorgesanabria6484 5 ай бұрын

This will be awesome. I am trying to learn the math behind transformers and PyTorch so hopefully this helps give me some intuition

@statquest 5 ай бұрын

I've got a video all about the math behind transformers here: kzbin.info/www/bejne/gaHLnoKAo7F0mqs

@ShadArfMohammed 5 ай бұрын

as always, wonderful content. Thanks :)

@statquest 5 ай бұрын

Thanks again!

@205-cssaurabhmaulekhi9 5 ай бұрын

Thank you I was in need of this 😊

@statquest 5 ай бұрын

Glad it was helpful!

@alexsemchenkov5740 Ай бұрын

Great job! Thanks a million!

@statquest Ай бұрын

Thanks!

@pompymandislian5628 Ай бұрын

so briliant, please create video scratch more again, i so like it thankyouu

@statquest Ай бұрын

Thanks! Will do!

@sinanrobillard2819 24 күн бұрын

Thank you very much for this!!!

@statquest 24 күн бұрын

Thanks!

@Pqrsaw 3 ай бұрын

Loved it! Thank you very much

@statquest 3 ай бұрын

Thank you!

@datasciencepassions4522 5 ай бұрын

God Bless You for the great work you do! Thank you so much

@statquest 5 ай бұрын

Thank you very much! :)

@gigabytechanz9646 Ай бұрын

Really helpful! Thanks

@statquest Ай бұрын

Glad it was helpful!

@Sikandar456 Ай бұрын

Hi Josh, this video really helped. Can you do one on diffusion models?

@statquest Ай бұрын

I'll keep that in mind.

@sharjeel_mazhar 5 ай бұрын

Thank you! You're the best!!!

@statquest 5 ай бұрын

You're welcome!

@iqra2291 3 ай бұрын

Amazing explanation 🎉❤ you are the best 😊

@statquest 3 ай бұрын

Thank you! 😃

@1msirius Ай бұрын

I really like your teaching

@statquest Ай бұрын

Thank you!

@1msirius Ай бұрын

@@statquest I should thank you sir! I love watching your videos!

@sidnath7336 5 ай бұрын

Awesome video! Maybe we can have a part 2 where we incorporate multi-head attention? 👌🏽 And then could make this a series on different decoder models and how they differ e.g., mistral uses RoPE and sliding window attention etc…

@statquest 5 ай бұрын

If you look at the code you'll see how to to create multi-headed attention: github.com/StatQuest/decoder_transformer_from_scratch

@旭哥-r5b 2 ай бұрын

Thank you. You're a lifesaver when I need this to finish my school project. However, if the input contains a various number of strings, do I add padding after ?

@statquest 2 ай бұрын

Yes, you do that when training a batch of inputs with different lengths.

@旭哥-r5b 2 ай бұрын

@@statquest Thank you for your help. However, if I use zero padding and include zero as a valid token in the vocabulary, won't the model end up predicting zero-which is meant to represent padding-thereby making the output meaningless?

@statquest 2 ай бұрын

@@旭哥-r5b You create a special token for padding.

@旭哥-r5b 2 ай бұрын

@@statquest And that token will still be used as the label for training?

@statquest 2 ай бұрын

@@旭哥-r5b I believe that is is correct.

@artofwrick 3 ай бұрын

Hey... Josh, can you please make a Playlist on all the videos on probability that you've posted so far??? Please ❤❤

@statquest 3 ай бұрын

I'll keep that in mind, in the mean time, you can go through the Statistics Fundaments in this list: statquest.org/video-index/

@miriamramstudio3982 5 ай бұрын

Great video. Thanks

@statquest 5 ай бұрын

Glad you liked it!

@mikinyaa 5 ай бұрын

🎉🎉🎉thank you😊

@statquest 5 ай бұрын

bam! :)

@cuckoo_is_singing 5 ай бұрын

hi josh, should embedding weigths be updated during training? for example nn.embedding(vocab_size,d_model) produces random numbers that each token will be referred to the related rows in our embedding matrice, should we update this weights during training? positional embedding weights are constant during our training and the only weights (except other parameters of course, like q,k,v) that prone to change are our nn.embedding weights! I wrote a code for translating amino acids to sequences everything in training works well with accuracy 95-98% but in inference stage I get to the bad results. i recall my model by loading_path=os.path.join(checkpoint_dir, config['model_name']) model.load_checkpoint(loading_path,model,optimizer) but after inference loop my result is like: 'tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc tcc ' :( even we assume my algorithm has overfitted We shouldn't get to this result! also I think other parameters like dropout factor should not be considered in inference stage (p=0 for dropout) I mean we shouldn't just reload the best parameters, we should change some parameters (srry I spoke alot :)) )

@statquest 5 ай бұрын

The word embedding weights are updated during training.

@codinghighlightswithsadra7343 3 ай бұрын

thank you ! can you please explane how we can use transformer in time series please?

@statquest 3 ай бұрын

I'll keep that in mind. But in the mean time, you can thank of an input prompt (like "what is statquest?") as a time series dataset - because the words are ordered and occur sequentially. So, based on a sequence of ordered sequence of tokens, the transformer generates a prediction about what happens next.

@mohamedthasneem7327 3 ай бұрын

Thank you very much sir...💚

@statquest 3 ай бұрын

Thanks!

@Faisal-cl9iu 5 ай бұрын

Thanks a lot for for this free wonderful content. ❤😊

@statquest 5 ай бұрын

Thank you!

@PadaiLikhai-hu6op Ай бұрын

never stop making videos, or else i'll track you down and make you eat very spicy chillies

@statquest Ай бұрын

bam! :)

@Priyanshuc2425 4 ай бұрын

Please include this in your happy halloween playlist

@statquest 4 ай бұрын

Thanks! Will do! :)

@Priyanshuc2425 4 ай бұрын

@@statquest triple bam :)

@__no_name__ 5 ай бұрын

I want to make a sequence prediction model. How should i test the model? What can i use for inference/ testing? (Not for natural language)

@statquest 5 ай бұрын

I'm pretty sure you can do it just like shown in this video, just swap out the words for the tokens in your sequence.

@thomasalderson368 5 ай бұрын

How about an encoder only classifier to round off the series? thanks

@statquest 5 ай бұрын

I'll keep that in mind.

@danielhernandezmota225 2 ай бұрын

I see that the two inputs have the same lenght... what would change if I wanted to train with another phrase, for instance: "What awesome statquest" (uses 4 tokens instead of 5). How can I generate an input with torch.tensor where the input is no longer the same dimension?

@statquest 2 ай бұрын

It depends. If you want to train everything in a batch, all at once, you can add a "" token and mask that out when calculating attention.

@Mạnhfefe 5 ай бұрын

thank you sm fr bro

@statquest 5 ай бұрын

Any time!

@zeroonetwothree1298 5 ай бұрын

Legend.

@statquest 5 ай бұрын

@kadirbasol82 10 күн бұрын

What about RWKV's ? can you explain them later ? they seem so lightweight

@statquest 10 күн бұрын

I'll keep that topic in mind.

@gayedemiray 5 ай бұрын

you are the best!!! hooray!!!! 😊

@statquest 5 ай бұрын

Thanks!

@arnabmishra827 5 ай бұрын

What is that extra "import" at line 2, @1.37

@statquest 5 ай бұрын

That's called a typo.

@aadijha14 2 ай бұрын

reply with :) if you are think statquest is fully hydrated while recording these

@aadijha14 2 ай бұрын

really excited for the book btw

@statquest 2 ай бұрын

bam! :)

@acasualviewer5861 5 ай бұрын

I'm confused as to why the values would come from the ENCODER when computing the cross attention between the Encoder and Decoder. Shouldn't the values come from the decoder itself? So if I trained a model to translate from English to German, then wanted to switch out the German for Spanish, I'd expect the new decoder to know what to do with the output of the Encoder. But if the values are coming from the Encoder, then this wouldn't work.

@statquest 5 ай бұрын

The idea is that the query in the decoder is used to determine how a potential word in the output is related to the words in the input. This done by using a query from the decoder and keys for all of the input words in the encoder. Then, once we have established how much (what percentages) a potential word in the output is related to all of the input word, we then have to determine what that percentage is of. It is of the values. And thus, the values have to come from the encoder. For more details, see: kzbin.info/www/bejne/sKm0qoeBbdaor7s

@kimjong-un4521 Ай бұрын

Finally completed. Took 1.5 months. God i am so slow

@statquest Ай бұрын

BAM! :) It took me over 4 years to make the videos, so 1.5 months isn't bad.

@gastonmorixe 5 ай бұрын

gold

@statquest 5 ай бұрын

Thanks!

@frommarkham424 3 ай бұрын

Optimus prime has been real quiet since this one dropped😬😬😬😬😬

@statquest 3 ай бұрын

@rishabhsoni 5 ай бұрын

Respect

@statquest 5 ай бұрын

Thanks!

@kratos_gow610 10 күн бұрын

13:07 / 31:10 Coding Position Encoding As of now.

@observor-ds3ro 5 ай бұрын

22:50 hey Josh you assigned 4 for number of tokens, but we have 5 tokens (including ) , even in the shape of the diagram, as you are pointing, there are 5 boxes (representing 5 outputs).. I got confused And you know what? Words fail me to say how much you affected on my life.. so I won’t say anything 😂

@statquest 5 ай бұрын

See 26:46 . At 22:50 we just assign a default value for that parameter, however, we don't use that default value when we create the transformer object at 26:46. Instead, we set it to the number of tokens in the vocabulary.

@BooleanDisorder 5 ай бұрын

I have imported a torch. Do I light it now?

@statquest 5 ай бұрын

@jayjhaveri1906 Ай бұрын

love youuu

@statquest Ай бұрын

@TheFunofMusic 5 ай бұрын

Triple Bam!!!

@statquest 5 ай бұрын

@tismanasou 5 ай бұрын

Let's start from the basics. ChatGPT is not a transformer. It's an application.

@statquest 5 ай бұрын

Yep, that's correct.

@keeperofthelight9681 5 ай бұрын

Sir can you include how to make the chatbot to hold a conversation with

@statquest 5 ай бұрын

I'll keep that in mind.

@HanqiXiao-x1u 5 ай бұрын

Horray!

@statquest 5 ай бұрын

@yosimadsu2189 5 ай бұрын

🙏🏻🙏🏻🙏🏻🙏🏻🙏🏻 Please please please show us how to train QVK Weights in detail 🙏🏻🙏🏻🙏🏻🙏🏻🙏🏻 You showed us just a simple call to function. But we are curious how it did the math, what to train, and how it can changes values of the weights. ABC

@statquest 5 ай бұрын

Every single weight and bias in a neural network is trained with backpropagation. To learn more about how this process works, see: kzbin.info/www/bejne/f3-ViaB4na5_qpY kzbin.info/www/bejne/n6rRY62adrGcn5o and kzbin.info/www/bejne/fXy9oIJ-jayWgtE

@yosimadsu2189 5 ай бұрын

@@statquest Since both QVK Weights are splitted and the calculations are passing non neural network, imho the back propagation process is quite tricky. In the other hand, the fit function did not tell the order of calculations on each nodes.

@Bartosz-o4p 5 ай бұрын

Bam! Peanut Butter and Jaaam ;)

@statquest 5 ай бұрын

@nossonweissman 5 ай бұрын

BAM!!

@statquest 5 ай бұрын

Thanks Nosson!

@kratos_gow610 10 күн бұрын

14:39 / 31:10 Coding Attention As of now

@paslaid 5 ай бұрын

🎉

@statquest 5 ай бұрын

@suika6459 5 ай бұрын

amazinggg

@statquest 5 ай бұрын

Thanks!

@zendr0 5 ай бұрын

Bam!

@statquest 5 ай бұрын

@mousquetaire86 5 ай бұрын

Wish you could be Prime Minister of the United Kingdom!

@statquest 5 ай бұрын

Ha! :)

@أحمدأكرمعامر 5 ай бұрын

Baaaam!❤

@statquest 5 ай бұрын

@김정헌-i8r 5 ай бұрын

GTP :)

@statquest 5 ай бұрын

Corrected! ;)

@Melle-sq4df 5 ай бұрын

in the very first slide the imports are broken at kzbin.info/www/bejne/eWq0hKOiatOgqLs `import torch.nn as nn import` # there's an extra trailing import here.

@statquest 5 ай бұрын

Yep, that's a typo. That's why it's best to download the code. Here's the link: github.com/StatQuest/decoder_transformer_from_scratch