Building makemore Part 2: MLP

  Рет қаралды 277,845

Andrej Karpathy

Andrej Karpathy

Күн бұрын

We implement a multilayer perceptron (MLP) character-level language model. In this video we also introduce many basics of machine learning (e.g. model training, learning rate tuning, hyperparameters, evaluation, train/dev/test splits, under/overfitting, etc.).
Links:
- makemore on github: github.com/karpathy/makemore
- jupyter notebook I built in this video: github.com/karpathy/nn-zero-t...
- collab notebook (new)!!!: colab.research.google.com/dri...
- Bengio et al. 2003 MLP language model paper (pdf): www.jmlr.org/papers/volume3/b...
- my website: karpathy.ai
- my twitter: / karpathy
- (new) Neural Networks: Zero to Hero series Discord channel: / discord , for people who'd like to chat more and go beyond youtube comments
Useful links:
- PyTorch internals ref blog.ezyang.com/2019/05/pytorc...
Exercises:
- E01: Tune the hyperparameters of the training to beat my best validation loss of 2.2
- E02: I was not careful with the intialization of the network in this video. (1) What is the loss you'd get if the predicted probabilities at initialization were perfectly uniform? What loss do we achieve? (2) Can you tune the initialization to get a starting loss that is much more similar to (1)?
- E03: Read the Bengio et al 2003 paper (link above), implement and try any idea from the paper. Did it work?
Chapters:
00:00:00 intro
00:01:48 Bengio et al. 2003 (MLP language model) paper walkthrough
00:09:03 (re-)building our training dataset
00:12:19 implementing the embedding lookup table
00:18:35 implementing the hidden layer + internals of torch.Tensor: storage, views
00:29:15 implementing the output layer
00:29:53 implementing the negative log likelihood loss
00:32:17 summary of the full network
00:32:49 introducing F.cross_entropy and why
00:37:56 implementing the training loop, overfitting one batch
00:41:25 training on the full dataset, minibatches
00:45:40 finding a good initial learning rate
00:53:20 splitting up the dataset into train/val/test splits and why
01:00:49 experiment: larger hidden layer
01:05:27 visualizing the character embeddings
01:07:16 experiment: larger embedding size
01:11:46 summary of our final code, conclusion
01:13:24 sampling from the model
01:14:55 google collab (new!!) notebook advertisement

Пікірлер: 332
@OlleMattsson
@OlleMattsson Жыл бұрын
0% hype. 100% substance. GOLD!
@farhadkarimi
@farhadkarimi Жыл бұрын
It’s insanely awesome that you are taking time out of your day to provide the public with educational videos like these.
@WouterHalswijk
@WouterHalswijk Жыл бұрын
I'm a senior aerospace engineer, so no CS or ML training at all, and I'm now totally fascinated with PyTorch. First that micrograd intro, which totally clicked the methods used for backprop into place. Now this intro with embedding and data preparation etc. I almost feel like transformers are within reach already. Inspiring!
@rajaahdhananjey4803
@rajaahdhananjey4803 Жыл бұрын
Quality Engineer with a Production Engineering background. Same feeling !
@staggeredextreme8213
@staggeredextreme8213 3 ай бұрын
How you guys landed here, i mean me as a cs graduate, I'll never land directly to a lecture series of aerospace that suddenly start to make sense 🤔
@rmajdodin
@rmajdodin Жыл бұрын
A two hour workshop on NLP with transformers costs 149$ in Invidia GTC conference. You tutor us with amazing quality for free. Thank you!🙂
@rayallinkh
@rayallinkh Жыл бұрын
Pls continue this series(and similar ones) to eternity! You are THE teacher which everyone interested/working in AI really needs!
@jerinjohnkachirackal
@jerinjohnkachirackal 11 ай бұрын
+1(00000000)
@peterwangsc
@peterwangsc 4 ай бұрын
This is amazing. Using just a little bit of what I was able to learn from part 3, namely the Kaiming init, and turning back on the learning rate decay, I was able to achieve 2.03 and 2.04 in my test and validation with a 1.89 in my training loss with just 300k iterations and 23k parameters. I set my block size to 4 and my embeddings to 12 and increased my hidden layer to 300 while decaying my learning rate exponent from -1 to -3 linear space over the 300k steps. All that without even using batch normalization yet. After applying batch norm, was able to get these down to 1.99 and 1.98 with training loss in the 1.7s after a little more tweaking. Really good content in this lecture, it really has me feeling like a chef in the kitchen almost, cooking up a model with a few turns of the knobs...This sounds like a game or a problem that can be solved with an AI trained on turning knobs.
@peterwangsc
@peterwangsc 4 ай бұрын
intuition: why 4 block size instead of 3 block size? well the english language i think has an average of somewhere between 3 to 5 characters per syllable, which most 1 syllable names falling between that 3-5 character bucket and some 2 syllable names falling in that 4-6 character bucket and beyond. I wanted a block size that would give some better indication on whether we're in a one syllable or two syllable context, and so we could end up with some more pronounceable names. It also just made sense to scale up the dimension of embeddings and neurons to give a little more nuance to the relationships between the different context blocks. English has so many different rules when it comes to vowels and silent letters and so I felt like we needed to give enough room for 3-4 degrees of freedom for each character in the context block, and therefore needed more neurons in the net to account for those extra dimensions. running the model for more steps just allows the convergence to happen. I don't know if it could get much better after more steps but this took 6-7 minutes to run so I think i squeezed all that I could out of these hyperparams.
@anveshicharuvaka2823
@anveshicharuvaka2823 Жыл бұрын
Hi Andrej, Even though I am already familiar with all this I still watch your videos for the pedagogical value and for learning how to do things. But, I still learn many new things about pytorch as well as how to think through things. The way you simplify complex stuff is just amazing. Keep doing this. You said on a podcast that you spend 10 hours for 1 hour of content, but you save 1000s of hours of frustration and make implementing ideas a little bit easier.
@Sovereign589
@Sovereign589 6 ай бұрын
great and true sentence: "You said on a podcast that you spend 10 hours for 1 hour of content, but you save 1000s of hours of frustration and make implementing ideas a little bit easier."
@matjazmuc-7124
@matjazmuc-7124 Жыл бұрын
I just want to say thank you Andrej, you are the best ! I've spent the last 2 days going over the first 3 videos (and completing the exercises), I must say that this is by far the best learning experience I ever had. The quality of the lectures is just immeasurable, in fact you completely ruined how I feel about lectures at my University.
@ahmedivy
@ahmedivy Жыл бұрын
where are the exercises?
@sam.rodriguez
@sam.rodriguez 8 ай бұрын
Check the comments from Andrej in each video @@ahmedivy
@allahm-ast3mnlywlatstbdlny164
@allahm-ast3mnlywlatstbdlny164 8 ай бұрын
​@@ahmedivydescription
@shaypeleg7812
@shaypeleg7812 8 ай бұрын
@@ahmedivyAlso asked myself, then found them in the movie description: Exercises: - E01: Tune the hyperparameters of the training to beat my best validation loss of 2.2 - E02: I was not careful with the intialization of the network in this video. (1) What is the loss you'd get if the predicted probabilities at initialization were perfectly uniform? What loss do we achieve? (2) Can you tune the initialization to get a starting loss that is much more similar to (1)? - E03: Read the Bengio et al 2003 paper (link above), implement and try any idea from the paper. Did it work?
@ncheymbamalu4906
@ncheymbamalu4906 11 ай бұрын
Much thanks, Andrej! I increased the embedding dimension to 5 from 2, initialized the model parameters from a uniform distribution [0, 1) instead of a standard normal distribution, increased the batch size to 128, and used the sigmoid activation for the hidden layer instead of the hyperbolic tangent, and was able to get the negative log-likelihood for the train and validation sets down to ~2.15, respectively.
@anangelsdiaries
@anangelsdiaries 8 күн бұрын
I am so happy people like you exist. Thank you very much for this video series.
@manuthegameking
@manuthegameking Жыл бұрын
This is amazing!!! I am an undergraduate student researching deep learning. This series is a gold mine. The attention to detail as well as the intuitive explanations are amazing!!
@rezathr8968
@rezathr8968 Жыл бұрын
Really enjoyed watching these lectures so far :) also +1 for the PyTorch internals video (@25:36)
@julian1000
@julian1000 Жыл бұрын
This is absolutely amazing stuff, thank you so much for putting this out for FREE!!!! I thought your name looked familiar and then I remembered you sparked my initial interest in NNs with "the unreasonable effectiveness of RNNs". It was SO fun and fascinating to just toss any old random text at it and see what it did! Can't believe how much progress has happened so quickly. Really really excited to get a better practical understanding of NNs and how to program them. Thank you again!
@rmajdodin
@rmajdodin Жыл бұрын
53:20 To break the data to training, developement and test, one can also use torch.tensor_split. n1 = int(0.8 * X.shape[0]) n2 = int(0.9 * X.shape[0]) Xtr, Xdev, Xts = X.tensor_split((n1, n2), dim=0) Ytr, Ydev, Yts = Y.tensor_split((n1, n2), dim=0)
@moalimus
@moalimus Жыл бұрын
Can't believe the value of these lecture and how helpful they are, you are literally changing the world. Thanks very much for your effort and knowledge
@bebebewin
@bebebewin Жыл бұрын
This is perhaps the best series on KZbin I have ever seen - Without a doubt I can't recall the last time a 1 hour video was able to teach me so much!
@cangozpinar
@cangozpinar Жыл бұрын
Thank you very much for taking your time to go step by step whether it be torch API, your code or the math behind things. I really appreciate it.
@cliffanthony1
@cliffanthony1 Жыл бұрын
Thanks. Seeing things coded from scratch clears up any ambiguities one may have when reading the same material in a book.
@bassRDS
@bassRDS Жыл бұрын
Thank you Andrej! I find your videos not only educational, but also very entertaining. Learning new things is exciting!
@vil9386
@vil9386 4 ай бұрын
Can't thank you enough. It's such a satisfying feeling to understand the logic under the ML models clearly. Thank you!
@tylerxiety
@tylerxiety 4 ай бұрын
Love all the tips and explanations on pytorch, training efficiency, and educational purposed errors. I was writing both code and notes and rewatching and enjoyed it and felt having a fruitful day after finished. It's like I was learning with a kind and insightful mentor sitting next to me. Thanks so much Andrej.
@vulkanosaure
@vulkanosaure Жыл бұрын
Thank you so much, this is gold, I'm watching all of this thoroughly, pausing the video a lot to wrap my head around those tensors manipulation (i didn't know anything abt python/numpy/pytorch). I'm also really inspired from how you quickly plot datas to get important insights, I'll do that too from now on
@mbpiku
@mbpiku Жыл бұрын
Never understood the basics of hyper parameter tuning so well. A sincere Thanks for the foundation and including that part in this video.
@ShouryanNikam
@ShouryanNikam 5 ай бұрын
What a time to be alive, someone as smart as Andrej giving away for free probably the best lectures on the subject. Thanks so much!!!
@myanxiouslife
@myanxiouslife Жыл бұрын
So cool to see the model learn through the embedding matrix that vowels share some similarity, 'q' and '.' are outlier characters, and so on!
@yiannigeorgantas1551
@yiannigeorgantas1551 Жыл бұрын
Whoa, you’re putting these out quicker than I can go through them. Thank you!
@cristobalalcazar5622
@cristobalalcazar5622 Жыл бұрын
This lecture compress an insanely amount of wisdom in 1.15hrs! Thanks
@grayboywilliams
@grayboywilliams Жыл бұрын
So many insights, I’ll have to rewatch it again to retain them all. Thank you!
@shaypeleg7812
@shaypeleg7812 8 ай бұрын
hi Andrej, Your lectures are the best ones I saw. It's amazing you take complex ideas and explain them in such a level that even beginners understand. Thank you for that.
@DanteNoguez
@DanteNoguez Жыл бұрын
I love the simplicity of your explanations. Thanks a lot!
@alexandermedina4950
@alexandermedina4950 Жыл бұрын
This is priceless, you have such a low and high level understanding of the topic, that's just amazing.
@zmm978
@zmm978 Жыл бұрын
I watched and followed many such courses, yours are really special, easy to understand yet very indepth, with many useful tricks.
@RickeyBowers
@RickeyBowers Жыл бұрын
Such a valuable resource to help people in other fields get up to speed on these concepts. Thank you.
@timilehinfasipe485
@timilehinfasipe485 Жыл бұрын
Thank you so much for this, Andrej !! I’m really learning and enjoying this
@varunjain8981
@varunjain8981 Жыл бұрын
Beautiful......The explanation!!!! This builds the intuition to venture out in unknown territories. Thanks from the bottom of my heart.
@Yenrabbit
@Yenrabbit Жыл бұрын
Really great series of lessons! Lots of gems in here for any knowledge level. PS: Increasing the batch size and lowering the LR a little does result in a small improvement in the loss. Throwing out 2.135 as my test score to beat :)
@not_elm0
@not_elm0 Жыл бұрын
This educational vid will reach more students than a regular teaching job at a regular school. Thanks for sharing & giving back👍
@nginfrared
@nginfrared 5 ай бұрын
Your lectures make me feel like I am in an AI Retreat :). I come out so happy and enriched after each lecture.
@user-pu7nq9jp6l
@user-pu7nq9jp6l Жыл бұрын
I believe that at 49:22 the losses and the learning rates are misaligned. The first loss (derived from completely random weights) is computed before the first learning rate is used, and therefor the first learning rate should be aligned with the second loss. You can simply solve this problem by using this snippet; lri = lri[:-1] lossi = lossi[1:] Also, thank you so much for these amazing lectures
@oshaya
@oshaya Жыл бұрын
Amazing, astounding… Andrej, you’re continuing your revolution for people’s education in ML. You are the “Che” of AI.
@isaacfranklin2712
@isaacfranklin2712 Жыл бұрын
Quite an ominous comparison, especially with Andrej working at OpenAI now.
@jeevan288
@jeevan288 Жыл бұрын
what does "Che" mean?
@gregoriovilardo
@gregoriovilardo 9 ай бұрын
​@@jeevan288 is a murderer that fight "for" cuba. "Che Guevara"
@pedroaugustoribeirogomes7999
@pedroaugustoribeirogomes7999 Жыл бұрын
Please create the "entire video about the internals of pytorch" that you mentioned in 25:40. And thank you so much for the content, Andrej !!
@alexandertaylor4190
@alexandertaylor4190 Жыл бұрын
I feel pretty lucky that my intro to neural networks is these videos. I've wanted to dive in for a while and I'm hooked already. Absolutely loving this lecture series, thank you, I can't wait for more! I'd love to join the discord but the invite link seems to be broken
@minhajulhoque2113
@minhajulhoque2113 Жыл бұрын
Such an amazing educational video. Learned a lot. Thanks for taking the time and explaining many concepts so clearly.
@shreyasdaniel627
@shreyasdaniel627 Жыл бұрын
You are amazing! Thank you so much for all your work :) You explain everything very intuitively!!! I was able to achieve a train loss of 2.15 and test loss of 2.17 with block_size = 4, 100k iterations and embed dimension = 5.
@Joseph_Myers
@Joseph_Myers Жыл бұрын
I wanted to let you know i listened to the podcast with Lex Fridman and i know understand how much of a Rockstar you are in the Artificial Intelligence space. Like many others i appreciate you and all you qre doing to push forward with this incredible technology. Thank you.
@ayogheswaran9270
@ayogheswaran9270 Жыл бұрын
Thank you, Andrej!! Thanks a lot for all the efforts you have put in❤
@svassilev
@svassilev Жыл бұрын
Great stuff @AndrejKarpathy! I actually was typing in parallel in my own notebook, as I was training on a different dataset. Amazing!
@kordou
@kordou 3 ай бұрын
Andrey thank you for this great series of lectures. you are a great Educator! 100% GOLD Material to Learn
@TheEbbemonster
@TheEbbemonster Жыл бұрын
I really enjoy these videos! A little note is that to run through the tutorial, it requires a bit of memory, so it would be nice with an early discussion of batching :) I run out of memory when calculating the loss, so had to reduce the sample size significantly.
@SandeepAitha
@SandeepAitha 2 ай бұрын
Watching your videos constantly reminds me of "There are no bad students but only bad teachers"
@DrKnowitallKnows
@DrKnowitallKnows Жыл бұрын
Thank you for referencing and exploring the Bengio paper. It's great to get academic context on how models like this were developed, and very few people actually do this in contexts like this.
@mdmusaddique_cse7458
@mdmusaddique_cse7458 5 ай бұрын
I was able to achieve a loss of 2.14 on test set Some hyperparameters: Neurons in hidden layer: 300 Batch size: 64 for first 400k iterations then 32 for rest Total Iterations: 600,000 Thank you for uploading such insightful explanations. I really appreciate that you explained how things work under the hood and insights of PyTorch's internals.
@ilyas8523
@ilyas8523 Жыл бұрын
underrated series. Very informative. Watching this series before jumping into the Chatbot video. I am currently building my own poem-gpt
@punto-y-coma7890
@punto-y-coma7890 Ай бұрын
That was really awesome explanation by all means!! thank you very much Andrej for educating us :)
@8eck
@8eck Жыл бұрын
I like how Andrej is operating with tensors, that's super cool. I think that we need a separate video about that from Andrej. It is super important.
@jeffreyzhang1413
@jeffreyzhang1413 Жыл бұрын
One of the best lectures on the fundamentals of ML
@arunmanoharan6329
@arunmanoharan6329 Жыл бұрын
Thank you so much Andrej! This the best NN series. Hope you will create more videos:)
@gilad13886
@gilad13886 Ай бұрын
Amazing video and series ! thank you. Small correction to the build_makemore_mlp.ipynb colab it's assuming the embedding size is 2 but eventually during the lecture it was changed to 10 so the emb.shape will be (32, 3, 10) and h.shape (32, 200), just FYI if you're running it and get confused
@ShadoWADC
@ShadoWADC Жыл бұрын
Thank you for doing this. This is truly a gift for all the learners and enthusiasts of this field!
@Democracy_Manifest
@Democracy_Manifest 9 ай бұрын
What an amazing teacher you are. Thank you
@akashkantthakur
@akashkantthakur Жыл бұрын
This is amazing and Informative. Thank you for the series.
@vincentyovian5480
@vincentyovian5480 Жыл бұрын
I've never been this excited for a lecture video before
@avishakeadhikary
@avishakeadhikary 4 ай бұрын
It is an absolute honor to learn from the very best. Thanks Andrej.
@joshwolff4592
@joshwolff4592 Жыл бұрын
The amount of times in college we used the PyTorch "view" function with ZERO explanation. And your explanation is not only flawless, you even make the explanation itself look easy! Thank you so much
@arildboes
@arildboes 11 ай бұрын
As a programmer trying to learn ML, this is gold!
@dreamtheater1999
@dreamtheater1999 5 ай бұрын
Great, very didactical content. Thanks a lot for all the effort you put on this!
@chineduezeofor2481
@chineduezeofor2481 9 сағат бұрын
Awesome tutorial. Thank you Andrej!
@danielk2055
@danielk2055 Жыл бұрын
Awesome explanation. Can’t wait for the next part and ultimately the transformer one.
@gleb.timofeev
@gleb.timofeev Жыл бұрын
On 45:45 I was waiting fot Karpathy's constant to appear. Thank you for the lecture, Andrej
@rookyvilakkumadathil8356
@rookyvilakkumadathil8356 7 ай бұрын
Thank You Andrej for the excellent training
@hasanhuseyinyurdagul5403
@hasanhuseyinyurdagul5403 Жыл бұрын
You are the best as always, thanks for the content
@alexanderliapatis9969
@alexanderliapatis9969 Жыл бұрын
I am into neural nets the last 2 years and i think i know some stuff about them (the basics at least) and i have taken a couple of courses and stuff about ml/dl. I was always wandring why do i need val and test set, why test the model on 2 different sets of the same data. So hearing that the val set is for finetuning of hyperparameters is a first for me and the fact that you use test set a few times in order to avoid overfitting on it as well. I am amazed by the content on your videos and the way you teach things. Keep up the good work, you are making the community a better place.
@tarakivu8861
@tarakivu8861 Жыл бұрын
I dont understand the overuse of the test-set. I mean we are only forward-passing that to evaluate the performance, so we arent learning anything? I can maybe see it when the dev sees the result and changes the network to better fit the test-case? But thats good isnt it?
@debdeepsanyal9030
@debdeepsanyal9030 14 күн бұрын
@@tarakivu8861 For the people later who will maybe stumble upon this comment and probably has the same doubt, here's an intuition i have that gives me a pretty thorough understanding. Say you are studying for an exam, and you use your textbooks for learning (note the use of learning here as well). Now, you want to know how good you're doing with the content you're learning from the textbooks, hence you give a mock exam, which kind of replicates the feeling of the final exam you're going to give. So you give test on the mock paper, and you note the mistakes or errors you are making on the mock paper, and you keep studying the text books and you give the mock test over and over again, periodically. After some time, you kind of have an estimate of how well you are going to do in the final exam based off the results you are getting on the mock exam. Here, learning from the textbooks is the model training on the train set. The mock exam, is the validation set. The final exam (which you just give once), is like the test set. Note that Dev set doesn't really change the network in any form or matter, it just gives us an estimate of how the model can perform on the test set. It's like if you are performing bad on the mock test, you know you can't make stuff better for the final exam.
@raziberg
@raziberg Жыл бұрын
Thanks for another video I wait for them now and am really happy when they come out
@flwi
@flwi Жыл бұрын
Great lecture! I also appreciate the effort on putting it on google collab. Way easier to access for people not familiar with python and its adventurous ecosystem. Recently got a new mac with an m1 processor and it took me a while to get tensorflow to run locally with gpu support, since I'm no python expert and therefor not familiar with all their package managers :-)
@ivomarbritosoares2568
@ivomarbritosoares2568 Жыл бұрын
The world needs to know about this youtube series. I already published it to my network on linkedin.
@roomo7time
@roomo7time Жыл бұрын
true true gift to humanity. huge thanks, really
@anasshaikhany9733
@anasshaikhany9733 Жыл бұрын
Astounding Sir ! I am very thankful to you, much respect 😁
@user-fv2qi7ce5w
@user-fv2qi7ce5w Жыл бұрын
It was awesome, as usual! And all of us vigorously waiting the next one lecture :)
@lonnybulldozer8426
@lonnybulldozer8426 Жыл бұрын
It's not waiting that you doing vigorously, it's stroking yourself.
@Gredias
@Gredias Жыл бұрын
Can't wait for the next one, these are fantastic.
@nabeelkhaan
@nabeelkhaan Жыл бұрын
Thank you for doing this. Please upload rest of the videos of the series soon.
@aangeli702
@aangeli702 10 ай бұрын
Andrej is the type of person that could make a video titled "Building a 'hello world' program in Python" which a 10x engineer could watch and learn something from it. The quality of these videos is unreal, please do make a video on the internals of torch!
@plashless3406
@plashless3406 9 ай бұрын
Thanks for taking the time from your research.
@sanjaybhatikar
@sanjaybhatikar 7 ай бұрын
Wow! Mind-blowing lecture.
@akshay_pachaar
@akshay_pachaar Жыл бұрын
Thank you so much Andrej!! 🙏 Can't wait to watch the next one and go all the way to transformers. 💙
@abir95571
@abir95571 Жыл бұрын
This is what true public service looks like ... kudos Andrej :)
@owendorsey5866
@owendorsey5866 Жыл бұрын
Part 2 already!!!! Thank you 🙏
@louiswang538
@louiswang538 Жыл бұрын
29:20 we can also use torch.reshape() to get the right shape for W. However, there is a difference between torch.view and torch.reshape TL;DR: If you just want to reshape tensors, use torch.reshape. If you're also concerned about memory usage and want to ensure that the two tensors share the same data, use torch.view.
@Koyaanisqatsi2000
@Koyaanisqatsi2000 9 ай бұрын
Amazing content. Thank you!
@ernietam6202
@ernietam6202 10 ай бұрын
Wow! I have longed to learn about hyper-parameters and training in a nutshell. Another Aha moment for me in Deep Learning. Thanks a trillion.
@yukselkapan9996
@yukselkapan9996 Жыл бұрын
Thank you for the lectures! @59:01 Made me chuckle
@american-professor
@american-professor 5 ай бұрын
I cannot believe word2vec was invented in 2003 instead of 2014
@sergeyyatskevitch3617
@sergeyyatskevitch3617 Жыл бұрын
Brilliant! Thank you, Andrej!
@ncheymbamalu4013
@ncheymbamalu4013 Жыл бұрын
Andrej, I was able to get a train and validation cross-entropy of 2.0243 and 2.1333, respectively. The hyperparameters that were changed were...the number of characters used to predict the next character (from 3 to 5), the length of each embedding vector (from 2 to 27, i.e., the number of tokens), and the batch size (from 32 to 128). Also, after optimizing the learning rate, I took the average of the 10 learning rates that produced the lowest cross-entropy and trained the model with it. Finally, I decreased that 'averaged' learning rate even further by an order of two magnitudes and trained the model one last time. In short, a lot of experimentation was required. Haha.
@tvcomputer1321
@tvcomputer1321 11 ай бұрын
really cool, thank you for putting this together. checking out the google collab now
@mj2068
@mj2068 10 ай бұрын
it really can't get any awesomer than this.
@ianfukushima1316
@ianfukushima1316 Жыл бұрын
Thanks! Awesome lecture!
@pastrop2003
@pastrop2003 Жыл бұрын
On top of everything else, this is absolutely the best documentation & explainer of PyTorch. This is infinitely better that the PyTorch documentation. In fact, it should be a must-see video for the PyTorch team to show them how to write good documentation. Meta should pay Adrej any fee he asks for the rights to use this video in the PyTorch docs...Thank you Andrej!
@sunderrajan6172
@sunderrajan6172 Жыл бұрын
Awesome as always and thanks for the colab. Now I can use my phone to run
@phanindraparashar8930
@phanindraparashar8930 Жыл бұрын
Can't wait for more !!!!! Amazing 👌 👏 🙀
@jurischaber6935
@jurischaber6935 Жыл бұрын
Thanks for this video. Learned a lot.😀
Building makemore Part 3: Activations & Gradients, BatchNorm
1:55:58
Andrej Karpathy
Рет қаралды 247 М.
The spelled-out intro to language modeling: building makemore
1:57:45
Andrej Karpathy
Рет қаралды 593 М.
когда одна дома // EVA mash
00:51
EVA mash
Рет қаралды 13 МЛН
格斗裁判暴力执法!#fighting #shorts
00:15
武林之巅
Рет қаралды 50 МЛН
Why? 😭 #shorts by Leisi Crazy
00:16
Leisi Crazy
Рет қаралды 31 МЛН
Learn Webflow in 2024 (Beginner Crash Course)
28:05
Flux Academy
Рет қаралды 5 М.
Building RAG at 5 different levels
24:25
Jake Batsuuri
Рет қаралды 9 М.
[1hr Talk] Intro to Large Language Models
59:48
Andrej Karpathy
Рет қаралды 1,8 МЛН
Making AI accessible with Andrej Karpathy and Stephanie Zhan
36:59
Sequoia Capital
Рет қаралды 210 М.
Day in the life of Andrej Karpathy | Lex Fridman Podcast Clips
12:45
Watching Neural Networks Learn
25:28
Emergent Garden
Рет қаралды 1,1 МЛН
PyTorch at Tesla - Andrej Karpathy, Tesla
11:11
PyTorch
Рет қаралды 508 М.
M4 iPad Pro Impressions: Well This is Awkward
12:51
Marques Brownlee
Рет қаралды 6 МЛН
Эффект Карбонаро и бумажный телефон
1:01
История одного вокалиста
Рет қаралды 2,4 МЛН
Готовый миниПК от Intel (но от китайцев)
36:25
Ремонтяш
Рет қаралды 430 М.
iPhone green Line Issue #iphone #greenlineissue #greenline #trending
0:10
Rk Electronics Servicing Center
Рет қаралды 4,5 МЛН
Теперь это его телефон
0:21
Хорошие Новости
Рет қаралды 1,6 МЛН
How Neuralink Works 🧠
0:28
Zack D. Films
Рет қаралды 30 МЛН