The spelled-out intro to language modeling: building makemore

Рет қаралды 601,218

Күн бұрын

We implement a bigram character-level language model, which we will further complexify in followup videos into a modern Transformer language model, like GPT. In this video, the focus is on (1) introducing torch.Tensor and its subtleties and use in efficiently evaluating neural networks and (2) the overall framework of language modeling that includes model training, sampling, and the evaluation of a loss (e.g. the negative log likelihood for classification).
Links:
- makemore on github: github.com/karpathy/makemore
- jupyter notebook I built in this video: github.com/karpathy/nn-zero-t...
- my website: karpathy.ai
- my twitter: / karpathy
- (new) Neural Networks: Zero to Hero series Discord channel: / discord , for people who'd like to chat more and go beyond youtube comments
Useful links for practice:
- Python + Numpy tutorial from CS231n cs231n.github.io/python-numpy... . We use torch.tensor instead of numpy.array in this video. Their design (e.g. broadcasting, data types, etc.) is so similar that practicing one is basically practicing the other, just be careful with some of the APIs - how various functions are named, what arguments they take, etc. - these details can vary.
- PyTorch tutorial on Tensor pytorch.org/tutorials/beginne...
- Another PyTorch intro to Tensor pytorch.org/tutorials/beginne...
Exercises:
E01: train a trigram language model, i.e. take two characters as an input to predict the 3rd one. Feel free to use either counting or a neural net. Evaluate the loss; Did it improve over a bigram model?
E02: split up the dataset randomly into 80% train set, 10% dev set, 10% test set. Train the bigram and trigram models only on the training set. Evaluate them on dev and test splits. What can you see?
E03: use the dev set to tune the strength of smoothing (or regularization) for the trigram model - i.e. try many possibilities and see which one works best based on the dev set loss. What patterns can you see in the train and dev set loss as you tune this strength? Take the best setting of the smoothing and evaluate on the test set once and at the end. How good of a loss do you achieve?
E04: we saw that our 1-hot vectors merely select a row of W, so producing these vectors explicitly feels wasteful. Can you delete our use of F.one_hot in favor of simply indexing into rows of W?
E05: look up and use F.cross_entropy instead. You should achieve the same result. Can you think of why we'd prefer to use F.cross_entropy instead?
E06: meta-exercise! Think of a fun/interesting exercise and complete it.
Chapters:
00:00:00 intro
00:03:03 reading and exploring the dataset
00:06:24 exploring the bigrams in the dataset
00:09:24 counting bigrams in a python dictionary
00:12:45 counting bigrams in a 2D torch tensor ("training the model")
00:18:19 visualizing the bigram tensor
00:20:54 deleting spurious (S) and (E) tokens in favor of a single . token
00:24:02 sampling from the model
00:36:17 efficiency! vectorized normalization of the rows, tensor broadcasting
00:50:14 loss function (the negative log likelihood of the data under our model)
01:00:50 model smoothing with fake counts
01:02:57 PART 2: the neural network approach: intro
01:05:26 creating the bigram dataset for the neural net
01:10:01 feeding integers into neural nets? one-hot encodings
01:13:53 the "neural net": one linear layer of neurons implemented with matrix multiplication
01:18:46 transforming neural net outputs into probabilities: the softmax
01:26:17 summary, preview to next steps, reference to micrograd
01:35:49 vectorized loss
01:38:36 backward and update, in PyTorch
01:42:55 putting everything together
01:47:49 note 1: one-hot encoding really just selects a row of the next Linear layer's weight matrix
01:50:18 note 2: model smoothing as regularization loss
01:54:31 sampling from the neural net
01:56:16 conclusion

Пікірлер: 661

@minjunesong6667 Жыл бұрын

I haven't commented on a youtube video since 2017. But I have to, in the slim case that you actually read this comment Adrej! Please keep doing what you are doing! You are an absolute gem of an educator, and the millions of minds you are enlightening with each video will do great things that will compound and make the world a better place.

@AndrejKarpathy Жыл бұрын

reminded of kzbin.info/www/bejne/eGmmZqagn82mqdE :D

@dhatrimukkamalla Жыл бұрын

@@AndrejKarpathy you sure made YT comments section a better place lol.. Excellent videos, please keep them coming, or shall I say make more! Thank you!!

@NewGirlinCalgary Жыл бұрын

@@AndrejKarpathy 🤣🤣🤣

@khalilsabri7978 Жыл бұрын

thanks for writing this comment for all of us ! please keep us these videos , as Minjune said , you're a gem of an educator !

@PatrickHoodDaniel Жыл бұрын

@@AndrejKarpathy So lo no mo. Classic!

@vincentd2418 Жыл бұрын

What a privilege to be learning from someone as accomplished as Andrej, all for free. The internet at its best🙏

@RalphDratman Жыл бұрын

Just what this is -- a privilege indeed! We don't even have to pay tuition, or travel to Stanford.

@barjeshbarjesh8215 10 ай бұрын

I am not lucky; I am blessed!

@nickfruneaux5232 8 ай бұрын

absolutely!!!

@kamikaze9271 7 ай бұрын

So true!

@iNTERnazionaleNotizia589 5 ай бұрын

Hopefully KZbin will be free FOREVER AND EVER, not like Medium or Towardsdatascience...

@AndrejKarpathy Жыл бұрын

Update: I added some suggested exercises to the description of the video. imo learning requires in-person tinkering and work, watching a video is not enough. If you complete the exercises please feel free to link your work here. (+Feel free to suggest other good exercises!)

@AndrejKarpathy Жыл бұрын

@@ibadrather oh. Please go to Discord then, linked in the description. Sorry :\

@ibadrather Жыл бұрын

I don't know if this is a good place for Q&A but there is something I need to ask that I cant't wrap my head around. I was training the trigram language model and loss was less than for the bigram language model but the model was worse. I tried to generate a few names and I reliased I made a huge error in data preparation. The question I have is how big of an indicator is loss? Is loss the only thing that matters for model performance. I understand there are other metrics of model perfromance. I have actually faced something in my work. I am stabilizing a video using IMU sensor. And I am training a NN for camera pose estimation. For different architectures the lower loss models don not necessarily perform better. When our team looks at the stanilized video many times the model with higher loss generates a visually better stabilized video. I don't quiet understand this. That's why I am asking how much is loss the indicative of model performance. I don't expect you to answer this here but if you may talk abou this in your future lectures or somewhere else.

@OwenKraweki Жыл бұрын

My loss for trigrams, count model, using all data for training was around 2.0931 and I was able to get close with NN approach. I'm not sure if the resulting names were better, but I wasn't able to generate the exactly same names with the count and NN approaches anymore (even using the same generator). Also I'm not sure how to best share/link my solution (I have the notebook on my local drive).

@sanesanyo Жыл бұрын

I built the trigram model by concatenating the one hot encoded vector for the first two letters & feed them through a neuron & rest is the same. I think that is fine way to train a trigram model. Any views on that? I did attain a lower loss compared to bigram although the results are not significantly better.

@stanislavnikolov7423 Жыл бұрын

@@ibadrather Not an expert myself, but here’s how I would explain it: Coming up with a loss function is like coming up with a target you optimise for. Apparently your perception of how good a result is (your human brain loss function) differs from what you optimise your network toward. In that case you should come up with a better equation to match your gut feeling. Practical example. Let’s say you want to train a network that produces ice cream. Your loss function is the amount of sugar in the ice cream. The best network you train crushes the loss, but produces simple 100% sugar syrup. It does not have the texture and consistency of real ice cream. A different network may make great ice cream texturewise, but put less sugar in it, thus having worse loss. So, adjust your loss function to score for texture as well.

@clray123 Жыл бұрын

The reason why this is such excellent teaching is because it's constructed bottom-up. It builds more abstract concepts using more concrete ones, generalizations follow concrete examples. At no point there is a situation in which the learner has to "just assume" something, which will "become clear later on" (in most instances when a teacher says it, it doesn't; it just causes people to desperately try to guess the required knowledge on their own to fill in the gaps, distracting from anything that follows, and producing mental discomfort). The bottom up approach produces pleasure from a series of little "a-ha" and "I agree" moments and a general trust in the teacher. I contrast this to the much worse fastai courses - in which there are lots of gaps and hand waving because of their top-down approach.

@deyjishnu 5 ай бұрын

This is exactly my experience as well. Well said.

@dx5nmnv Жыл бұрын

You're literally the best, man. These lessons are brilliant, hope you keep doing them. Thank u so much

@talhakaraca Жыл бұрын

Seeing him back to education is great. Hope to see some computer vision lessons 👍👌

@HarishNarayanan Жыл бұрын

@@talhakaraca If you search on this very site you will find truly clear lessons on computer vision from Andrej (from like 2016 or so)!

@talhakaraca Жыл бұрын

@@HarishNarayanan thanks a lot. i found it 🙏

@HarishNarayanan Жыл бұрын

@@talhakaraca You are very welcome. It was that lecture series that got me first informed and interested in deep learning.

@RalphDratman Жыл бұрын

I cannot imagine a better -- or kinder -- teacher. He is feeding his audience knowledge and understanding, in small delicious bites, without worrying about their level of prior knowledge. And he is smiling irrepresively all the time! Such a good person.

@amanjha9759 Жыл бұрын

The scale of impact these lectures will have is going to be enormous. Please keep doing them and thanks a lot Andrej.

@samot1808 Жыл бұрын

I am pretty confident that the impact of his CS231n course is bigger than even his work at Tesla. I know too many people working in machine learning that where introduced to the field by CS231n. It changed my life. Makes you wonder if he should just spend all his efforts to teaching. The impact is truly exponential.

@XorAlex Жыл бұрын

Too many people are working on AI performance and not enough people are working on AI alignment. If this trend continues, the impact might be enormously negative.

@samot1808 Жыл бұрын

@@XorAlex please explain

@izybit Жыл бұрын

@@samot1808 AI alignment and the AI control problem are aspects of how to build AI systems such that they will aid rather than harm their creators. it basically means that we are like kids playing with plutonium and it won't take much for someone to turn it into a bomb (on purpose or by mistake) and make everyone's life a living hell. All that leads to a need for more regulation and oversight of the really advanced AI models because otherwise we may end up with AI generators that can take a photo of you and create a video showing you killing babies, or worse, an AI that self-replicates and takes over entire systems leading to collapsed economies and countries (or, maybe, even something like the Terminator).

@lukeanglin263 3 ай бұрын

Never in my life have I found an educator like this. This is free gold.

@vonziethenmusic 14 күн бұрын

so fascinating! and then scale it up and it starts talking like real humans and knows everything! this is holarious!! it s so cool that you just put this vids out here!!

@mahdi_sc 10 ай бұрын

The video series featured on your channel undoubtedly stands as the most consequential and intuitive material I have encountered. The depth of knowledge gained from these instructional materials is significant, and the way you've presented complex topics with such clarity is commendable. I find myself consistently recommending them to both friends and colleagues, as I truly believe the value they offer to any learner is unparalleled. The gratitude I feel for the work you've put into these videos is immense, as the concepts I've absorbed from your content have undoubtedly expanded my understanding and competence. This invaluable contribution to the field has benefited me tremendously, and I am certain it has equally enriched others' learning experiences.

@swfsql 10 сағат бұрын

Thanks a lot for the video! One way to help describing some things would be to say that there is a 27-faced dice being tossed, and depending on the face that comes up, that is the resulting character (. a b c etc). Also the dice can be unfair, that is, it's not the exact same chance for every face. Then for the first (counting) approach, what is being done is that 27 different dices are being prepared - one for each row, for each possible previous character. Then for the second approach, instead of having 27 pre-ordered dices, the network is being able to produce the (unfair) dice by itself, instead of having them being explicitly pre-made.

@jakobpcoder Жыл бұрын

Absolutely insane levels of detail you are going into. This Series is invaluable for beginners in the field as well as for people like me, who are building own models all the time, but want to go back to basics from time to time to not get stuck in wrong assumptions learned from fast success with Keras :D I really hope you will continue this Series for quite a while! Thanks a lot, AI Grand Master Andrej!

@krzysztofwos1856 Жыл бұрын

Andrej, your videos are the clearest explanations of these topics I have ever seen. My hat off to you. I wish you have taught my ML and NLP classes in college. There's a huge difference between your ground-up, code-first approach and the usual dry, academic presentation of these topics. It also demonstrates the power of KZbin as an educational tool. Thank you for your efforts!

@MattRosinski Жыл бұрын

I love how you make the connections between the counting and gradient based approaches. Seeing the predictions from the gradient descent method were identical to the predictions from statistical probabilities from the counts was, for me, a big aha moment. Thank you so much for these videos Andrej. Seeing how you build things from the ground up to transformers will be fascinating!

@dansplain2393 Жыл бұрын

I’ve literally never had heard the logits are counts, softmax turns into probs, way of thinking before. Worth the ticket price alone!

@mrdbourke Жыл бұрын

Another weekend watch! Epic to see these videos coming out! Thank you for all your efforts Andrei!

@realquincyhill Жыл бұрын

Your intro to neural networks video is amazing, especially how you focused on the raw mathematical fundamentals rather than just implementation. I can tell this is going to be another banger.

@ax5344 Жыл бұрын

OMG, I feel soooo grateful for the internet! I would have never met a teacher this clear and to my needs in real life. I have watched the famous Standford courses before; they have set a standard in ML courses. It is always the Standford courses and the rest. Likewise, this course is setting a new standard on hands-on courses. I'm only half an hour into the video. I'm already amazed by the sensitivity, clarity and organization of the course. Many many thanks for your generosity to step out and share your knowledge with numerous strangers in the world. Much much indebted! Thank you!

@a9raag 5 ай бұрын

This is one of the best educational series I have stumbled upon on YT in years! Thank you so much Andrej

@saintsfan8119 Жыл бұрын

Lex guided me here. I loved your micrograd tutorial. It brought back my A level calculus and reminded me of my Python skills from years back - all whilst teaching me the basics of neural networks. This tutorial is now putting things into practise with a real-world example. Please do more of these, as you're sure to get more people into the world of AI and ML. Python is such a powerful language for manipulating data and you explain it really well by building things up from a basic foundation into something that ends up being fairly complex.

@FireFly969 27 күн бұрын

I love how you take nn, and explain to us, not by already built in function in pytorch, but by how things works, then giving us what the equivelent lf it in pytorch

@Consural Жыл бұрын

A teacher that explains complex concepts both clearly and accurately. I must be dreaming. Thank you Mr. Karpathy.

@niclaswustenbecker8902 Жыл бұрын

I love your clear and practical way of explaining stuff, the code is so helpful in understanding the concepts. Thanks a lot Andrej!

@RaviAnnaswamy Жыл бұрын

Mastery is ability to stay with fundamentals. Andrej derives the neural architecture FROM the counts based model! So the log counts, counts and probs are wrapped around the idea of how to get to probs similar to the counts model. Thus he explains why you need to log, why you need to normalizes, then introduces the name for it called softmax! What a way to teach. Brilliant master stroke is when he shows that the samples from the neural model exactly match the samples from the counts model. Wow, I would not have guessed it and many teachers might not have checked it. The connection between 'smoothing' and 'regularization' was also a nice touch. Teaching the new concepts in terms of the known so that there is always a way to think about new ideas rather than taking them as given. For instance the expected optimal loss of the neural model is what one would see in the counts model. Thanks Andrej! By the way one way to interpret the loss, is perplexity. What the number 2.47 says is that every character on average has typically about 2 or 3 characters that are more likely to follow it.

@jorgitozor 24 күн бұрын

Really incredible how you can explain clearly a complex subject only with raw material. Thanks a lot for the valuable knowledge

@tusharkalyani4343 Жыл бұрын

This video is the goldmine. It's so intuitive and easy to understand. Even my grad classes could not squeeze this information over a semester-long course. Hats off and it's a privilege to be learning from the accomplished AI master and the best. Thank you for the efforts Andrej :).

@myfolder4561 5 ай бұрын

Thank you Andrej. I can't stress more how much I have benefited and felt inspired by this series. I'm a 40 yo father with a young kid. Work and being a parent have consumed lots of my time - I have always wanted to learn ML/neural network from the ground up but a lot of materials out there are just thick and dense and full of jargons. Coming from a math and actuarial background I had kind of expected myself to be able to pick up this knowledge without too much stumbling but seriously not until your videos did I finally feel so strongly interested and motivated in this subject. It's really fun learning from you and coding along with you - I'm leaving your lectures each time more energized than when it first started. You're such a great educator as many have said.

@noah8405 Жыл бұрын

Taught me how to do the Rubik’s cube, now teaching me neural networks, truly an amazing teacher!

@akshaygulabrao4423 Жыл бұрын

I love that he understood what a normal person wouldn’t understand and explained those parts.

@cassie8324 Жыл бұрын

No one on youtube is producing such granular explanations of neural network operations. You have an incredible talent for teaching! Please keep producing this series, it is so refreshing to get such clear, first-principles content on something I'm passionate about from someone with a towering knowledge of the subject.

@taylorius Жыл бұрын

I think minimal, simple-as-possible code implementations, talked through, are just about the best possible way to learn new concepts. All power to you Andrej, and long live these videos.

@hlinc2 Жыл бұрын

The clarity from this video of all the fundamental concepts and how they connect blew my mind. Thank you!

@imliuyifan Жыл бұрын

Note if you are following this in torch 2.0, the multinomial function might behave differently in getting the idx (3 instead of 13). Just downgrade to torch==1.13.1 if this bothers you.

@eriklinde 7 ай бұрын

Thank you! Was scratching my head a bit... Edit: Actually I still can't get it to reproduce 13 (getting 3)....

@RounakJain91 6 ай бұрын

Wow thank you. I've been getting a 10 on torch 2.1.0+cu118

@paullarkin2970 5 ай бұрын

same@@RounakJain91 which is funny because when i left num samples = 20, the first sample was 13 but 1 sample gives me 10...

@Stravideo Жыл бұрын

What a pleasure to watch! love the fact there is no shortcut, even for what may seem easy. Everything is well explained and easy to follow. It is very nice to show us the little things to watch for.

@vil9386 5 ай бұрын

What a sense of "knowledge satisfaction" I have after watching your video and working out the details as taught. THANK YOU Andrej.

@baboothewonderspam 9 ай бұрын

Thank you for creating this - it's incredibly high-quality and I'm learning so so much from it! It's such a privilege to be able to learn from you.

@muhannadobeidat Жыл бұрын

Amazing delivery as always. Fact that he spent time explaining broadcast rules and some of the quirks of Keepdim shows how much knowledgeable he is and fact that he knows that most struggle with little things like that to get past what they need to do.

@sebastianbitsch Жыл бұрын

What an incredible resource - thank you Andrej. I especially enjoyed the intuitive explanation of regularization, what a smooth way of relating it to the simple count-matrix

@bluecup25 Жыл бұрын

I love how you explain every step and function so your tutorial is accessible for non-python programmers as well. Thank you.

@talis1063 Жыл бұрын

You're such a good teacher. Nice and steady pace of gradual buildup to get to the end result. Very aware of points where student might get lost. Also respectful of viewers time, always on topic. Even if I paid for this, I wouldn't expect this quality, can't believe I get to watch it for free. Thank you.

@blaze-pn6fk Жыл бұрын

Amazing videos! it's insane how detailed and intuitive these videos are. Thanks for making these.

@edz8659 Жыл бұрын

This is the first of your lessons I have watched and you really are one of the best teachers I've ever seen. Thank you for your efforts.

@radek_osmulski Жыл бұрын

Unbelievable lecture, Andrej 🙏 So many wonderful parallels. Thanks a lot for recording this and sharing it so freely with the world 🙂

@mcnica89 Жыл бұрын

The level of pedagogy is so so good here; I love that you start small and build up and I particularly love that you pointed out common pitfalls as you went. I am actually teaching a course where I was going to have to explain broadcasting this term, but I think I am just going to link my students to this video instead. Really excellent stuff! One small suggestion is to consider using Desmos instead of wolframalpha is you just want to show a simple function

@lorisdeluca610 Жыл бұрын

Wow. Just WOW. Andrej you are simply too good! Thank you for sharing such valuable content on KZbin, hands down the best one around.

@guitdude13 Жыл бұрын

This is connecting so many dots for me. I really enjoy the combo of theory with practical tips for using the torch APIs.

@jeankunz5986 Жыл бұрын

Andrej, the elegance and simplicity of your code is beautiful and an example of the right way to write python

@steveseeger Жыл бұрын

Andrej thanks for doing this. You can have a larger impact bringing ML to the masses and directly inspiring a new generation of engineers and developers than you could have managing work at Tesla.

@adarshsonare9049 Жыл бұрын

I went through building micro grad 3~4 times, It took me a week to understand a good portion of that and now started with this. I am really looking forward to going through this series. Thanks for doing this Andrej, you are amazing.

@adityay525125 Жыл бұрын

I just want to say, thank you Dr Karpathy, the way you explain concepts is just brilliant, you are making me fall in love with neural nets all over again

@snarkyboojum Жыл бұрын

So cool to see the equivalance between the manually calculated model and neural network model optimised with gradient descent. It's not quite the same output either. The regularization loss is required to get the two super close too. Pretty neat.

@mahmoudabuzamel7038 Жыл бұрын

You're getting me speechless the way you explain things and simplify concepts!!!!

@candrewlee14 Жыл бұрын

You are incredible! This makes learning about ML so fun, your passion and knowledge really shine here. I’m a college student studying CS, and you lecture better than many professors. Not a knock on them though, props to you.

@darielsurf Жыл бұрын

Hi Andrej, I heard two days ago (from a Lex Fridman podcast) that you were thinking in pursuing something related to education. I was surprised and very excited, wishing that it was true and accessible. Today, I ran into your KZbin channel and I can't be happier, thanks a lot for doing this and for sharing your valuable knowledge! The lectures are incredible detailed and interesting. It's also very nice to see how you enjoy talking about these topics, that's very inspiring. Again, thank you!

@yangchenyun Жыл бұрын

This lecture is well paced and introduces concepts one by one where later complex ones built on top of previous ones.

@pablofernandez2671 Жыл бұрын

Amazing explanations, Andrej. Thanks for sharing your knowledge in such a clear and enlightening way. Thank you soooo much! I'm really motivated thanks to you.

@fotonical Жыл бұрын

Another awesome breakdown, once again love how you take the time to help transfer intuition up and above implementation details.

@stephanewulc807 Жыл бұрын

Brillant, simple, complete, accurate, I have only compliments. Thank you very much for one of the best class I had in my life !

@rockapedra1130 Жыл бұрын

Fantastic! Thank you for sharing your many years of experience! You are a truly gifted educator!

@Mutual_Information Жыл бұрын

This is how I'm spending my time off from Thanksgiving break. Watching this whole series 🍿

@ronaldlegere 11 ай бұрын

There are so many fantastic nuggets in these videos even for those already with substantial pytorch experience!

@anvayjain4100 2 ай бұрын

The way he explained zip method even a beginner can understand. From very basic python to an entire language model. I can't thank this man enough for teaching us.

@gabrielmoreiraassuncaogass8044 Жыл бұрын

Andrej, i'm from Brazil and love ML and to code. I have tried several different classes with various teachers, but yours was by far the best. Simplicity with quality. Congratulations! I loved the class! Looking forward to taking the next ones. The best!.

@maxhansen5166 Жыл бұрын

This Channel is an amazing complement to the Andrew Ng's DL Specialisation!!!

@jtl_1 2 ай бұрын

Besides having the best explanation of LLMs from this great teacher, you get a free hands on python course, which has also better explanation than lots of others. Thx a lot Andrejq!

@alternative1967 2 ай бұрын

You're lifting the lid on the black box and it feels like Im sitting on a perceptron and watching the algos make their changes, forward, and back. It has provided such a deeper understanding of the topics in the video. I have recommended it to my cohort of AI students, of which I am one, as supplementary learning. But to be honest, this is the way it should be taught. Excelllent job, Andrej.

@oshaya Жыл бұрын

Not exactly revolutionary but so damn well explained and resolved in PyTorch. This is ML pedagogy for the masses. I praise your efforts.

@oshaya Жыл бұрын

However, getting people to understand the nitty-gritty of a Transformer Language Model (like GPT), that will prove truly revolutionary!

@tycox9364 Жыл бұрын

Holy shit, an actual starting point ❤️

@raziberg Жыл бұрын

Thanks a lot for the videos. I was familiar with the concepts of basic machine learning but not with the actual workings of it. You really helped me get to the next level of understanding.

@log_of_1 6 ай бұрын

Thank you for taking the time out of your busy days to teach others. This is a very kind act.

@sharangkulkarni1759 5 ай бұрын

I like the idea of pairing two opposite forces and acheiving perfect balance. Regularization

@javidjamae 8 ай бұрын

Phenomenal tutorial, thanks so much! I went through the entire thing and built it up from scratch and learned a ton!

@DocLow Жыл бұрын

Thank you for posting these! It's extremely valuable. The end result of the neural net wasn't all that anticlimatic, at least the longer "names" did differ slightly so it wasn't 100% the same weights as in the first case :)

@biswaprakashmishra398 Жыл бұрын

The density of information in these tutorials is hugeeeee.

@BT-te9vx 10 ай бұрын

10 mins into the video, I'm amazed, smiling and feeling as if I've cracked the code to life itself (with the dictionary of bigram counts). of course, I can't build anything with the level of knowledge I have currently but I sure can appreciate how it works in a much better manner. I always knew that things are predicted based on their occurrence in data but somehow seeing those counts(for eg. for first 10 words, of `('a', ''): 7`) makes it so glaringly obvious which no amount of imagination could've done for me. You are a scientist, researcher, high paid exec, knowledgeable, innovator but more than anything, you are the best teacher who can elucidate complex things in simple terms which then all make sense and seem obvious. And that requires not just mastery but a passion for the subject.

@pauek Жыл бұрын

Two hours of pure clarity... soo addicting!!

@jedi10101 6 ай бұрын

new sub here. i started w/ "let's build gpt: from scratch, in code, spelled out". i learned lot, enjoyed coding along, appreciated the thoughtful explanations. i'm hooked & will be watching the makemore series. thank you very much sir for sharing your knowledge.

@pastrop2003 Жыл бұрын

Thank you Andrej, this is absolutely the best hands-on coding neural nets & PyTorch tutorial. Special thanks for decoding cryptic PyTorch docs. Very, very useful!

@yuanhu6031 4 ай бұрын

I absolutely love these entire episode, high quality content and very educational. Thanks a lot for doing this for the good of general public.

@DigiTechAnimation-xk1tp 3 ай бұрын

The music in this video is perfect. It really sets the mood and creates a powerful emotional connection.

@kindoblue Жыл бұрын

Thank God you are pushing videos! Grateful 🤟

@owendorsey5866 Жыл бұрын

So excited to watch this! The tutorial about micrograd was great. Can’t wait to make my way through this one, loving the content :)

@richardmthompson 11 ай бұрын

There's so much packed in there. I spent the whole day on this and got to the 20 minute mark, haha. Great teacher, thank you for this logical and practical approach.

@timbuthfer901 8 ай бұрын

Thank you Andrej so much for these brilliant presentations. You explain each step perfectly! I have built my own simple neural network framework as an exercise to understand the gradient descent concepts and I am just starting to look at pytorch. I really like the autograd functionality, its very flexible as essentially the developer can use any combination of differentiable functions in the activation layer.

@tolifeandlearning3919 6 ай бұрын

This is so awesome. Thanks Andrej for being so nice and sharing your knowledge.

@JuanuHaedo Жыл бұрын

Please DONT STOP soing this! The world is so lucky to have you sharing this knowledge!

@spazioLVGA Жыл бұрын

You definitely have a talent for education. Use it and you'll do so much good for so many people. Thank you Andrej!

@jimmy21584 Жыл бұрын

I’m an old-school software developer, revising machine learning for the first time since my undergrad studies. Back in the day we called them Markov Chains instead of Bigram Models. Thanks for the fantastic refresher!

@benjaminlai5638 Жыл бұрын

Andrej, thank you for creating these video. They are the perfect balance of theory and practical implementation.

@vq8gef32 Ай бұрын

Thank you so much Andrej . Amazing I am really enjoying every second of these series.

@odiseezall Жыл бұрын

I was searching for a series to explain in-depth what a transformer is. I found it.

@chesstictacs3107 Жыл бұрын

This is such a valuable content! Keep it up, Andrej! Thank you for sharing your knowledge!

@AntLabJPN Жыл бұрын

Another absolutely fantastic tutorial. The detail is incredible and the explanations are so clear. For anyone watching this after me, I feel that the micrograd tutorial is absolutely essential to watch first if you want to really understand things from the ground up. Here, for example, when Andrej runs the loss.backward() function, you'll know exactly what's happening, because you do it manually in the first lesson. I feel that the transition from micrograd (where everything is built from first principles) to makemore (relying on the power of pytorch) leaves you with a suprisingly deep understand of the fundamentals of language modeling. Really superb.

@mlock1000 3 ай бұрын

Doing this again to really solidify the fundamentals and github copilot is hilarious. It's seen this code so many times that if it is enabled you can't actually type everything out for yourself! It all comes rolling out (pretty much perfect, tweaked to my style and ready to run) after the first character or two. Amazing times. Got to say, whoa. This is so good. So sick of fumbling about with tensors and this is a masterclass for sure. Thank you thank you thank you.

@djubadjuba Жыл бұрын

Amazing, this baby steps approach is so powerful to sediment the concepts.

@Pythoncode-daily 6 ай бұрын

Thank you for the unique opportunity to learn how to write code from the most advanced developer, Andrej! An almost priceless and irreplaceable opportunity! Extremely useful and efficient!

@esaliya Жыл бұрын

This is like listening to a lecture by Richard Feynman. Super clear!

@janasandeep Жыл бұрын

Thanks for explaining things at such an incredible level of detail. (1:08:49) A couple of days ago I was too searching for what the difference between torch.Tensor and torch.tensor. Passing dtype explicitly seems like a good practice to me. Also the caution on the broadcasting rules is very valid. Often the bugs in my Pytorch/Tensorflow code are due to unexpected broadcasting going undetected because of say an averaging or reduction operation. Putting a lot of asserts on the shapes helps me.

@mohitkumar-nt3sp Ай бұрын

If anyone confused at 1:17:00, just think like first row of x get multiplied with w's first cols corresponding entries, and returns a scaler value which then takes place of x@w[1,1], the operation continues for all the remaining 26 columns of w, and eventually filling values of x@w[1,j], j representing the column number of w. This process starts again with 2nd row of x and filling the values of x@w[2,j], this continues till we are exhausted with x's rows which are 5 here so, the final row is x@w[5, j], or x@w has shape of i,j where i = number of rows of x and j = number of rows of w.