Building makemore Part 5: Building a WaveNet

  Рет қаралды 146,846

Andrej Karpathy

Andrej Karpathy

Күн бұрын

We take the 2-layer MLP from previous video and make it deeper with a tree-like structure, arriving at a convolutional neural network architecture similar to the WaveNet (2016) from DeepMind. In the WaveNet paper, the same hierarchical architecture is implemented more efficiently using causal dilated convolutions (not yet covered). Along the way we get a better sense of torch.nn and what it is and how it works under the hood, and what a typical deep learning development process looks like (a lot of reading of documentation, keeping track of multidimensional tensor shapes, moving between jupyter notebooks and repository code, ...).
Links:
- makemore on github: github.com/karpathy/makemore
- jupyter notebook I built in this video: github.com/karpathy/nn-zero-t...
- collab notebook: colab.research.google.com/dri...
- my website: karpathy.ai
- my twitter: / karpathy
- our Discord channel: / discord
Supplementary links:
- WaveNet 2016 from DeepMind arxiv.org/abs/1609.03499
- Bengio et al. 2003 MLP LM www.jmlr.org/papers/volume3/b...
Chapters:
intro
00:00:00 intro
00:01:40 starter code walkthrough
00:06:56 let’s fix the learning rate plot
00:09:16 pytorchifying our code: layers, containers, torch.nn, fun bugs
implementing wavenet
00:17:11 overview: WaveNet
00:19:33 dataset bump the context size to 8
00:19:55 re-running baseline code on block_size 8
00:21:36 implementing WaveNet
00:37:41 training the WaveNet: first pass
00:38:50 fixing batchnorm1d bug
00:45:21 re-training WaveNet with bug fix
00:46:07 scaling up our WaveNet
conclusions
00:46:58 experimental harness
00:47:44 WaveNet but with “dilated causal convolutions”
00:51:34 torch.nn
00:52:28 the development process of building deep neural nets
00:54:17 going forward
00:55:26 improve on my loss! how far can we improve a WaveNet on this data?

Пікірлер: 172
@khisrowhashimi
@khisrowhashimi Жыл бұрын
I love how we are all so stressed and worried that Andrej might grow apathetic to his KZbin channel, so everyone wants to be extra supportive 😆 Really shows how awesome of a communicator he is.
@TL-fe9si
@TL-fe9si Жыл бұрын
I'm literally thinking about it when I saw this comment
@jordankuzmanovik5297
@jordankuzmanovik5297 7 ай бұрын
Unfortunately he did it :(
@isaac10231
@isaac10231 6 ай бұрын
​@@jordankuzmanovik5297Hopefully he comes back.
@PollPoII
@PollPoII Жыл бұрын
This series is the most interesting resource for DL I've come across, being a junior ML engineer myself. To be able to watch such a knowledgeable domain expert as Andrej explaining everything in the most understandable ways is a real privilege. A million thanks for you time and effort, looking forward to the next one and hopefully many more.
@GlennGasner
@GlennGasner 10 ай бұрын
I really, really appreciate you putting in the work to create these lectures. I hope you can really feel the weight of the nearly hundred thousand humans who pushed through 12 hours of lectures on this because you've made it accessible. And that's just through now. These videos are such an incredible gift. Half of the views are me because I needed to watch each so many times in order to understand what's happening, because I started from so little. Also, it's super weird how different you are from other KZbinrs and yet how likable you become as a human during this series. You are doing this right, and I appreciate it.
@crayc3
@crayc3 Жыл бұрын
Notification for a new andrej video guide feels like a new season of game of thrones just dropped at this point.
@nervoushero1391
@nervoushero1391 Жыл бұрын
As a independent deep learning undergrad student ur videos helps me a lot. Thank u andrej Never stop this series.
@anrilombard1121
@anrilombard1121 Жыл бұрын
We're on the same road!
@tanguyrenaudie1261
@tanguyrenaudie1261 Жыл бұрын
Love the series as well ! Coding through all of it. Would love to get together with people to replicate deep learning papers, like Andrej does here, to learn faster and not by myself.
@raghavravishankar6262
@raghavravishankar6262 11 ай бұрын
@@tanguyrenaudie1261 I'm in the same boat as well do you have a discord or something where we can talk further?
@raghavravishankar6262
@raghavravishankar6262 11 ай бұрын
@Anri Lombard @ Nervous Hero
@timelapseguys4042
@timelapseguys4042 Жыл бұрын
Andrej, thanks a lot for the video! Please do not stop continuing the series. It's an honor to learn from you.
@maestbobo
@maestbobo Жыл бұрын
Best resource by far for this content. Please keep making more of these; I feel I'm learning a huge amount from each video.
@rajeshparekh
@rajeshparekh 2 ай бұрын
Thank you so much for creating this video lecture series. Your passion for this topic comes through so vividly in your lectures. I learned so much from every lecture and especially appreciated how the lectures started from the foundational concepts and built up to the state-of-the art techniques. Thank you!
@sakthigeek2458
@sakthigeek2458 29 күн бұрын
Learned a lot of practical tips and theoretical knowledge of why we do what we do and also the history of how Deep Learning evolved. Thanks a lot for this series. Requesting you to continue the series.
@hintzod
@hintzod Жыл бұрын
Thank you so much for these videos. I really enjoy these deep dives, things make so much more sense when you're hand coding all the functions and running through examples. It's less of a black box and more intuitive. I hope this comment will encourage you to keep this going!
@vivekpadman5248
@vivekpadman5248 Жыл бұрын
Absolutely love this series Andrej sir... It not only teaches me stuff but gives me confidence to work even harder to share whatever I know already.. 🧡
@NarendraBME
@NarendraBME 2 ай бұрын
So far THE BEST lecture series I came across on KZbin. Along side learning the neural networks in this series, I have learned the PyTorch more than learning it by waching a PyTorch video series of 26 hrs from a youtuber.
@stracci_5698
@stracci_5698 11 ай бұрын
This is truly the best dl content out there. Most courses just focus on the theory but lack deep understanding.
@1knmd
@1knmd Жыл бұрын
Everytime a new video is out is like christmas for me!, please don't stop doing this, best ML content out there.
@mipmap256
@mipmap256 Жыл бұрын
Can't wait for part 6! So clear and I can follow step by step. Thanks so much
@cktse_jp
@cktse_jp Ай бұрын
Just wanna say thank you for sharing your experience -- love this from-scratch series starting from first principles!
@stanislawcronberg3271
@stanislawcronberg3271 Жыл бұрын
My favorite way to start a Monday morning is to wake up to a new lecture in Andrej's masterclass :)
@aurelienmontmejat1077
@aurelienmontmejat1077 Жыл бұрын
This is the best deep learning course I've followed! Even better than the one on Coursera. Thanks!
@ishaanrajpal273
@ishaanrajpal273 Жыл бұрын
My best way to learn is to learn from one of the most experienced person in the field. Thanks for everything Andrej
@panagiotistseles1118
@panagiotistseles1118 5 ай бұрын
Totally amazed by the amount of good work you put in. You've helped a lot of people Andrej. Keep up the good work
@aanchalagarwal6886
@aanchalagarwal6886 9 ай бұрын
Thank you Andrej for creating this series. It has been very helpful. I just hope you get the time to continue with it.
@eustin
@eustin Жыл бұрын
Yes! I've been telling everyone about these videos. I've been checking whether you posted the next video everyday. Thank you.
@Zaphod42Beeblebrox
@Zaphod42Beeblebrox Жыл бұрын
I experimented a bit with the MLP with 1 hidden layer and managed to scale it up to your fancy hierarchical model. :) Here is what i got: MLP(105k parameters): block_size = 10 emb_dim = 18 n_hidden = 500 lr = 0.1 # used the same learning rate decay as in the video epochs = 200000 mini_batch = 32 lambd = 1 ### added L2 regularization seed is 42 Training error: 1.7801 Dev error: 1.9884 Test error: 1.9863 (I checked this only becouse I was worried that somehow I overfitted the dev set) Some examples generated from the model that I kinda liked: Angelise Fantumrise Bowin Xian Jaydan
@oklm2109
@oklm2109 Жыл бұрын
What's the formula to calculate the number of parameters of an MLP model?
@amgad_hasan
@amgad_hasan 10 ай бұрын
@@oklm2109 You just add the trainable parameters of every layer. If the model contains only Fully connected layers (aka linear in pytorch or dense in tf), the number of parameters for each layer is: n_weights = n_in*n_hidden_units n_biases = n_hidden units n_params = n_weights + n_biases = (1+n_input)*(n_hidden_units) n_in: number of inputs (think of it as the number of outputs(or hidden units) from the last layer. This formula is valid for Linear layers, other types of layers may have different formula.
@glebzarin2619
@glebzarin2619 8 ай бұрын
I'd say that it is slightly not fair not to compare models with different block sizes. Because it not only influences the number of parameters but also the amount of information given as input.
@brianwhite9137
@brianwhite9137 Жыл бұрын
Very grateful for these. An early endearing moment was in the Spelled-Out Intro when you took a moment to find the missing parentheses for 'print.'
@ShinShanIV
@ShinShanIV Жыл бұрын
Thank you so much Andrej for the series, it helps me a lot. You are one of the reasons I was able to get into ML and build a career there. I admire your teaching skills! I didn't get why the sequence dim has to be part of the batch dimension, and I didn't hear Andrej talk about it explicitly, so here is my reasoning: The sequence dimension is an additional batch dimension because the output before batch norm is created by a linear layer with (32, 4, 20) @ (20, 68) + (68) which performs the matrix multiplication only with the last dimension (.., .., 20) and in parallel on the first two. So, the matrix multiplication is performed 32 * 4 times with (20) @ (20, 68). Thus, it's the same as having a (128, 20) @ (20, 68) calculation, where (32 * 4) = 128 is the batch dimension. So, the sequence dimension is treated effectively as if it was a "batch dimensions" in the linear layer and must be treated that way in batch norm too. (would be great if someone could confirm)
@ephemer
@ephemer Жыл бұрын
Thanks so much for this series, I feel like this is the most important skill I might ever learn and it’s never been more accessible than in your lectures. Thank you!
@timowidyanvolta
@timowidyanvolta 8 ай бұрын
Please continue, I really like this series. You are an awesome teacher!
@sr10009
@sr10009 Жыл бұрын
Thanks again Andrej! Love these videos! Dream come true to watch and learn these! Thanks for all you do to help people! You're helpfulness ripples throughout the world! Thanks again! lol
@art4eigen93
@art4eigen93 8 ай бұрын
Please continue this series Sir Andrje. You are the savior!
@thanikhurshid7403
@thanikhurshid7403 Жыл бұрын
Andrej you are the absolute greatest. Keep making your videos. Anxiously waiting to implement Transformers with you
@WarrenLacefield
@WarrenLacefield Жыл бұрын
Enjoying these video so much. To refresh most of what I've forgotten about Python and to begin playing with pytorch. Last I did this stuff myself was with C# and CNTK. Now going back to rebuild and rerun old models and data (much faster even & "better" results). Thank you.
@Leon-yp9yw
@Leon-yp9yw Жыл бұрын
I was worried I was going to have to wait a couple of months for the next video as I finished part 4 just last week. Can't wait to get into this one, thanks a lot for this series Andrej
@milankordic
@milankordic Жыл бұрын
Was looking forward to this one. Thanks, Andrej!
@sunderrajan6172
@sunderrajan6172 Жыл бұрын
Beautifully explained as always - thanks. It shows how much passion you have to come up with these awesome videos. We all blessed!
@ERRORfred2458
@ERRORfred2458 9 ай бұрын
Andrej, thanks for all you do for us. You're the best.
@ThemeParkTeslaCamping360
@ThemeParkTeslaCamping360 Жыл бұрын
Incredible video this helps a lot. Thank you for videos, especially I loved your Stanford videos regarding machine learning from scratch and that's how you do it without any libraries like tensorflow and pytorch. Keep going and thank you for helping hungry learners like me!!! Cheers 🥂
@nikitaandriievskyi3448
@nikitaandriievskyi3448 Жыл бұрын
I just found your youtube channel, and this is just amazing, please do not stop doing these videos, they are incredible
@VasudevaK
@VasudevaK Жыл бұрын
Sir, it's pleasure to learn from you! Thank you so much. Will be meeting you one day in-person, just to thank you.
@timandersen8030
@timandersen8030 Жыл бұрын
Thank you, Andrej! Looking forward to the rest of the series!
@yanazarov
@yanazarov Жыл бұрын
Absolutely awesome stuff Andrej. Thank you for doing this.
@kshitijbanerjee6927
@kshitijbanerjee6927 9 ай бұрын
Hey Andrej! I hope you continue and give us the RNN, GRU & Transformer lectures as well! The chatGPT one is great, but I feel like we missed the story in the middle, and jumped the story because of ChatGPT
@SupeHero00
@SupeHero00 9 ай бұрын
The ChatGPT lecture is the Transformer lecture.. And regarding RNNs, I don't see why would anyone still use it...
@kshitijbanerjee6927
@kshitijbanerjee6927 9 ай бұрын
transformers yes . but it’s not like anyone will build bigrams either, it’s about learning the concepts like BPTT etc from roots
@SupeHero00
@SupeHero00 9 ай бұрын
@kshitijbanerjee6927 Bigrams and MLPs help you understand Transformers (which is the SOA).. Anyway IMO it would be a waste of time creating a lecture on RNNs, but if the majority want it, then maybe he should do it.. I don't care
@kshitijbanerjee6927
@kshitijbanerjee6927 9 ай бұрын
Fully disagree that it’s not useful. I think the concepts of how they came up unrolling and BPTT, the gates used to solve long term memory problems are invaluable to appreciate and understand why transformers are such a big deal.
@attilakun7850
@attilakun7850 3 ай бұрын
@@SupeHero00 RNNs are coming back due to SSMs like Mamba.
@kimiochang
@kimiochang Жыл бұрын
Finally Completed this one. As always thank you Andrej for your generosity! Next I will practice through all five parts again and learn how to accelerate the training process by using GPUs.
@meisherenow
@meisherenow 3 ай бұрын
How cool is it that anyone with an internet connection has access to such a great teacher? (answer: very)
@flwi
@flwi Жыл бұрын
Great series! I really enjoy the progress and good explanations.
@AlienLogic775
@AlienLogic775 Жыл бұрын
Thanks so much Andrej! Hope to see a Part 6
@AndrewOrtman
@AndrewOrtman Жыл бұрын
When I did the mean() trick at ~8:50 I left out an audible gasp! That was such a neat trick, going to use that one in the future
@4mb127
@4mb127 Жыл бұрын
Thanks for continuing this fantastic series.
@michaelmuller136
@michaelmuller136 2 ай бұрын
That was a very great playlist, easy to understand and very helpfull, thank you very much!!
@kindoblue
@kindoblue Жыл бұрын
Every video another solid pure gold bar
@veeramahendranathreddygang1086
@veeramahendranathreddygang1086 Жыл бұрын
Thank you Sir. Have been waiting for this.
@creatureOfnature1
@creatureOfnature1 Жыл бұрын
Much appreciated, Andrej. Your tutorials are gem!
@pablofernandez2671
@pablofernandez2671 Жыл бұрын
Andrej, we all love you. You're amazing!
@ayogheswaran9270
@ayogheswaran9270 11 ай бұрын
@Andrej thank you for making this. Please continue making such videos. It really helps beginners like me. If possible, could you please make a series of how actual development and production is done.
@Leo-sy4vu
@Leo-sy4vu Жыл бұрын
Thank you soo much for the series i recently started it and its the best thing on the entire youtube. keep it up
@kemalware4912
@kemalware4912 Жыл бұрын
Deliberate errors on the right spot.. Your lectures are great.
@thehazarika
@thehazarika Жыл бұрын
This is philanthropy! I love you man!
@kaenovama
@kaenovama Жыл бұрын
Thank you! Love the series! Helped me a lot with my learning experience with PyTorch
@EsdrasSoutoCosta
@EsdrasSoutoCosta Жыл бұрын
Awesome! Well explained and clear what's being done. Please keep doing this fantastic videos!!!
@vivekpandit7417
@vivekpandit7417 Жыл бұрын
Been waiting for awhile. Thankyouuu !!
@aidanbraski
@aidanbraski 3 ай бұрын
great video, been learning a ton from you recently. thank you andrej!
@polloramirez
@polloramirez Жыл бұрын
Great content, Andrej! Keep them coming!
@wholenutsanddonuts5741
@wholenutsanddonuts5741 Жыл бұрын
Fant wait for this next step in the process!
@fatihveyselnurcin
@fatihveyselnurcin Жыл бұрын
Thank you Andrej, hope to see you again soon
@kaushik333ify
@kaushik333ify 8 ай бұрын
Thank you so much for these lectures ! Can you please make a video on the “experimental harness” you mention at the end of the video? It would be super helpful and informative.
@BlockDesignz
@BlockDesignz Жыл бұрын
Please keep these coming!
@utkarshsingh1663
@utkarshsingh1663 Жыл бұрын
Thanks Andrej this course is awesome for base building..
@Abhishekkumar-qj6hb
@Abhishekkumar-qj6hb 9 ай бұрын
So I ended up this lecture series and I was expecting RNN/LSTM/GRU but was not there however throughout learnt a lot that can definitely on my own. Thanks Andrej
@mellyb.1347
@mellyb.1347 11 күн бұрын
Loved this series. Would you please be willing to continue it so we get to work through the rest of CNN, RNN, and LSTM? Thanks!
@enchanted_swiftie
@enchanted_swiftie 8 ай бұрын
The sentence that Anderej said at 49:26 made me realize something, something very deep. 🔥
@ivaninkorea
@ivaninkorea 2 ай бұрын
Awesome series!
@DanteNoguez
@DanteNoguez Жыл бұрын
Thanks, Andrej, you're awesome!
@mobkiller111
@mobkiller111 Жыл бұрын
Thanks for the content & explanations Andrej and have a great time in Kyoto :)
@nickgannon7466
@nickgannon7466 Жыл бұрын
You're crushing it, thanks a bunch.
@fajarsuharyanto8871
@fajarsuharyanto8871 Жыл бұрын
Rarely finish entire episode. He'i Andrej 👌
@Joker1531993
@Joker1531993 Жыл бұрын
I am subscribing Andrej, just to support someone from our country, Slovakia. Even I don't understand nothing from the video >D
@Jack-vv7zb
@Jack-vv7zb 28 күн бұрын
i love it when you say bye and then pop back up 😂😂😂😂
@georgehu8652
@georgehu8652 5 ай бұрын
Best video forever
@philipwoods6720
@philipwoods6720 Жыл бұрын
SO EXCITED TO SEE THIS POSTED LEEEEETS GOOOOOOOO
@duonga.nguyen7826
@duonga.nguyen7826 Жыл бұрын
Keep up your great work!
@arielfayol7198
@arielfayol7198 10 ай бұрын
Please don't stop the series😢
@repostcussion
@repostcussion Жыл бұрын
Amazing video! I'm absolutely loving the series, and following along in my own notebooks :) I'm curious about the first layer embedding, and what kinds of alternatives there are? More information could be given by increasing the size of the embedding to the size of the vocab to make it a onehot. I imagine there should be more alternatives beyond this, maybe something that can use the int32 char ints directly?
@joekharris
@joekharris Жыл бұрын
I'm learning so much. I really appreciate the lucidity and simplicity of your approach. I do have a question. Why not initialize running_mean and running_var to None and then set them on the first batch? That would seem to be a better approach than to start them at zero and would be consistent with making them exponentially weighted moving averages - which they are except for the initialization at 0.0.
@dimitristaufer
@dimitristaufer Жыл бұрын
Hi Andrej, thank you for taking the time to create these videos. In this video, for the first time, I'm having difficulties understanding what the model is actually learning. I've watched it twice and tried to understand the WaveNet paper, but that isn't really helping. Given an example input “emma“, the following character is supposed to be “.“, why is it beneficial to create a hidden layer to process “em“, “ma“, and then “emma“? Are we essentially encoding that given a 4 character word, IF the first two characters are “em“ it is likely that the 5th character is “.“, no matter what the third and fourth characters are? In other words, this implementation would probably assign a higher probability that “.“ is the fifth character after an unseen name, e.g. “emli“, simply because it starts with the bigram “em“? Thanks in advance, Dimitri.
@venkateshmunagala205
@venkateshmunagala205 Жыл бұрын
AI Devil is back . Thanks for the video @Andrej Karpathy.
@8eck
@8eck Жыл бұрын
Finally finished all the lectures and i understood that i have a bad math understanding and bad understanding of dimensionality and operations over it. Anyways, thank you for helping out with the rest concepts and practices, i do better understand now of how backprop is working and what it is doing and what for.
@Ali-lm7uw
@Ali-lm7uw Жыл бұрын
Jon Krohn has some a full playlist of algebra and calculus before starting machine learning
@shouryamann7830
@shouryamann7830 10 ай бұрын
ive been using this step loss function and I've been consistently getting slight better training and validation losses. for this i got 1.98 val loss. lr = 0.1 if i < 100000 else (0.01 if i < 150000 else 0.001)
@alekseizinchenko1171
@alekseizinchenko1171 Жыл бұрын
Just in time ❤
@lotfullahandishmand4973
@lotfullahandishmand4973 Жыл бұрын
Dear Andrej your work is amazing, we are here to share and have a beautiful world all together and you are doing that. If you could make a video about Convolution NNs, or Image net top architectures, any thing deep related to vision, that would be great Thank you !
@Erosis
@Erosis Жыл бұрын
Numpy / torch / tf tensor reshaping always feels like handwaivy magic.
@amirkonjkav5374
@amirkonjkav5374 Жыл бұрын
Thanks for your videos ,is it possible to talk about nlp special about the background of it?
@reubenthomas1033
@reubenthomas1033 Жыл бұрын
Awesome content!!
@netanelmad
@netanelmad 5 ай бұрын
Thank you very much.
@aisolutions834
@aisolutions834 Жыл бұрын
Hi Andrej, Great content, Would you please go over the Transformer paper and its implementation?
@jackfrost7734
@jackfrost7734 Жыл бұрын
@AndrejKarpathy are you planning to introduce the topic of uncertainty estimation on NN model?
@colehoward5144
@colehoward5144 Жыл бұрын
Great video! In your next video, would you be able add a section where you show how to matrix multiply n-dimensional tensors? I am a little confused by what the output/shape should be for something like (6, 3, 9, 9) @ (3, 9, 3)
@milosz7
@milosz7 6 ай бұрын
multiplying matrces with these shapes is not possible
@colehoward5144
@colehoward5144 6 ай бұрын
@@milosz7yeah it doesn't look like it at first, but they are compatible. Results in output shape (6,3,9,3)
@CarlosReyes-ku6ub
@CarlosReyes-ku6ub Жыл бұрын
Awesome, thank you so much
@mynameisZhenyaArt_
@mynameisZhenyaArt_ 3 ай бұрын
Hi Andrej. Is there going to be RNN, LSTN, GRU video? or maybe even part 2 on the topic of WaveNet with the residual connections?
@nova2577
@nova2577 Жыл бұрын
Could you also do some video related to wave2vec, as well as GPT series? Much appreciated!! Started to follow your online video lectures when you were at Stanford.
@simonkotchou9644
@simonkotchou9644 Жыл бұрын
Thanks so much
@aashishaggarwal3231
@aashishaggarwal3231 11 күн бұрын
Very niceee..... (thank you)
Let's build GPT: from scratch, in code, spelled out.
1:56:20
Andrej Karpathy
Рет қаралды 4 МЛН
PyTorch at Tesla - Andrej Karpathy, Tesla
11:11
PyTorch
Рет қаралды 503 М.
Sigma Girl Past #sigmagirl #funny #viral
00:22
CRAZY GREAPA
Рет қаралды 17 МЛН
have you already done this?😭🙏❓
00:19
LOL
Рет қаралды 7 МЛН
Heroes of Deep Learning: Andrew Ng interviews Andrej Karpathy
15:11
Preserve Knowledge
Рет қаралды 174 М.
RAG on FHIR with Knowledge Graph Part 1
6:53
Sam Schifman
Рет қаралды 2,6 М.
Day in the life of Andrej Karpathy | Lex Fridman Podcast Clips
12:45
Transformers, explained: Understand the model behind GPT, BERT, and T5
9:11
Why is this number everywhere?
23:51
Veritasium
Рет қаралды 2,1 МЛН
Let's build the GPT Tokenizer
2:13:35
Andrej Karpathy
Рет қаралды 385 М.
How I’d learn ML in 2024 (if I could start over)
7:05
Boris Meinardus
Рет қаралды 675 М.
Central network installation Part 2
0:20
Bilochpuratips Automobile
Рет қаралды 4,6 МЛН
Лучший смартфон для Call of Duty Warzone Mobile
0:58