Create a Large Language Model from Scratch with Python

Create a Large Language Model from Scratch with Python - Tutorial

Рет қаралды 750,907

Күн бұрын

Learn how to build your own large language model, from scratch. This course goes into the data handling, math, and transformers behind large language models. You will use Python.
✏️ Course developed by ‪@elliotarledge‬
💻 Code and course resources: github.com/Infatoshi/fcc-intr...
Join Elliot's Discord server: / discord
Elliot on X: / elliotarledge
⭐️ Contents ⭐️
(0:00:00) Intro
(0:03:25) Install Libraries
(0:06:24) Pylzma build tools
(0:08:58) Jupyter Notebook
(0:12:11) Download wizard of oz
(0:14:51) Experimenting with text file
(0:17:58) Character-level tokenizer
(0:19:44) Types of tokenizers
(0:20:58) Tensors instead of Arrays
(0:22:37) Linear Algebra heads up
(0:23:29) Train and validation splits
(0:25:30) Premise of Bigram Model
(0:26:41) Inputs and Targets
(0:29:29) Inputs and Targets Implementation
(0:30:10) Batch size hyperparameter
(0:32:13) Switching from CPU to CUDA
(0:33:28) PyTorch Overview
(0:42:49) CPU vs GPU performance in PyTorch
(0:47:49) More PyTorch Functions
(1:06:03) Embedding Vectors
(1:11:33) Embedding Implementation
(1:13:06) Dot Product and Matrix Multiplication
(1:25:42) Matmul Implementation
(1:26:56) Int vs Float
(1:29:52) Recap and get_batch
(1:35:07) nnModule subclass
(1:37:05) Gradient Descent
(1:50:53) Logits and Reshaping
(1:59:28) Generate function and giving the model some context
(2:03:58) Logits Dimensionality
(2:05:17) Training loop + Optimizer + Zerograd explanation
(2:13:56) Optimizers Overview
(2:17:04) Applications of Optimizers
(2:18:11) Loss reporting + Train VS Eval mode
(2:32:54) Normalization Overview
(2:35:45) ReLU, Sigmoid, Tanh Activations
(2:45:15) Transformer and Self-Attention
(2:46:55) Transformer Architecture
(3:17:54) Building a GPT, not Transformer model
(3:19:46) Self-Attention Deep Dive
(3:25:05) GPT architecture
(3:27:07) Switching to Macbook
(3:31:42) Implementing Positional Encoding
(3:36:57) GPTLanguageModel initalization
(3:40:52) GPTLanguageModel forward pass
(3:46:56) Standard Deviation for model parameters
(4:00:50) Transformer Blocks
(4:04:54) FeedForward network
(4:07:53) Multi-head Attention
(4:12:49) Dot product attention
(4:19:43) Why we scale by 1/sqrt(dk)
(4:26:45) Sequential VS ModuleList Processing
(4:30:47) Overview Hyperparameters
(4:32:14) Fixing errors, refining
(4:34:01) Begin training
(4:35:46) OpenWebText download and Survey of LLMs paper
(4:37:56) How the dataloader/batch getter will have to change
(4:41:20) Extract corpus with winrar
(4:43:44) Python data extractor
(4:49:23) Adjusting for train and val splits
(4:57:55) Adding dataloader
(4:59:04) Training on OpenWebText
(5:02:22) Training works well, model loading/saving
(5:04:18) Pickling
(5:05:32) Fixing errors + GPU Memory in task manager
(5:14:05) Command line argument parsing
(5:18:11) Porting code to script
(5:22:04) Prompt: Completion feature + more errors
(5:24:23) nnModule inheritance + generation cropping
(5:27:54) Pretraining vs Finetuning
(5:33:07) R&D pointers
(5:44:38) Outro
🎉 Thanks to our Champion and Sponsor supporters:
👾 davthecoder
👾 jedi-or-sith
👾 南宮千影
👾 Agustín Kussrow
👾 Nattira Maneerat
👾 Heather Wcislo
👾 Serhiy Kalinets
👾 Justin Hual
👾 Otis Morgan
--
Learn to code for free and get a developer job: www.freecodecamp.org
Read hundreds of articles on programming: freecodecamp.org/news

Пікірлер: 590

@codewithdevhindi9937 10 ай бұрын

Highly appreciate that you guys make things like this for free!

@JoeD0403 10 ай бұрын

There goes my weekend. Thank you! This is absolutely amazing material! I’m 5 minutes in and already hooked.

@webgpu 10 ай бұрын

5 minutes ? Great! just watch the remaining 5 hours and 40 minutes ! 😆

@JoeD0403 10 ай бұрын

@@webgpu Took more than the weekend, but just finished… 🥵

@webgpu 10 ай бұрын

@@JoeD0403 did you finish watching almost all 6 hrs ??? OMG you deserve an award! 🏆🍻 (i'm sure the time spent on it is absolutely worth it 👍)

@JoeD0403 10 ай бұрын

@@webgpu Break it up into 15-30 min at a time over a few days and it’s not so bad. It was VERY thorough and amazing to see how deep the rabbit hole goes, so far.

@manan-543 7 ай бұрын

Hey there. I am planning to do this course. I want to get into the gen ai field. I've already done a different training and built some Langchain programs. But I want to go deeper. Are there any prerequisites for this? Do I need to know pytorch or tensorflow or something else

@besomewheredosomething 8 ай бұрын

Overall a good course. It's a bit rough around the edges, but if you are persistent or already somewhat knowledgeable you can get through it and come out a bit smarter on the other end. Worth the almost 6 hours.

@andreramalho9908 10 ай бұрын

Anyone can learn hard concepts from this guy, cuz he turns hard concepts into easily understandable ones! Congrats for being a such amazing teacher!! Teaching is all about facilitating the process of learning, but somehow people tend to overcomplicate things so they can sound smarter than majority. This guy does exactly the opposite of it. Thank u and God bless!

@AhmedKhaliet 10 ай бұрын

Great work as usual❤ can't wait to deep dive in it

@samuraipiang8203 10 ай бұрын

Thanks FCC I have been waiting this kind of Course. At last on this❤❤❤❤

@ayushagrawal2405 10 ай бұрын

Yesterday I was searching for this, and today you dropped it! Great job people 🙌

@Karthik_Beatzzz 7 ай бұрын

Just finished this course. One of the best courses out there to understand basic concepts of LLM. Can't believe this goldmine exists for free on YT

@Whatismyhandle 6 ай бұрын

@@BajaFreeze I think he try to look smarter.

@brunothedev 6 ай бұрын

@@BajaFreeze 80/20 rule, look it up

@akshatphadtare2437 6 ай бұрын

Hey what are the prereq for this course?

@brunothedev 6 ай бұрын

@@akshatphadtare2437 Nothing really, just the patience to research and understand the content

@nickm.4274 3 ай бұрын

@@brunothedev I'd say you should have decent coding experience before starting, and some knowledge of linear algebra and probability theory will also be helpful. Diving right into a bunch of libraries and trying to understand what's going on would be very confusing if you don't know the basics of programming, and while you might be able to complete the course I'd find it hard to believe a beginner would have truly learned much from it.

@8eck 10 ай бұрын

Damn, this looks like a jewel. Will definitely look into it. Thank you for sharing this!

@adammonson 4 ай бұрын

Your patients and ability to explain in basic terms makes learning easy. Thank you, your efforts and willingness to share information is much appreciated.

@artistpw 2 ай бұрын

You might want to change "patients" to "patience".

@DataPulse 10 ай бұрын

Great content for free. Wishing that this hits 1 million views soon. Keep up the great work.

@WasimAbdul_1994 9 ай бұрын

Finally someone explained transformers properly! Great job!

@jostafro4967 10 ай бұрын

"in this course, you're going to learn a LOT of crazy stuff!" I knew this was going to be a good one!

@dantefrias 3 ай бұрын

Thanks FCC, thanks for your effort Elia!! Good job!

@artizanal60 10 ай бұрын

I just looked recently over the internet if I can find a course about creating an llm from scratch to have an idea how it work. I was disappointed for the lack of results. And than this comes out. Thanks!

@elliotarledge 10 ай бұрын

Yessir

@erfanshayegani3693 3 ай бұрын

Awesome stuff! Congrats on being such a good teacher!

@golmatol6537 10 ай бұрын

Just started this tutorial and I'm sure it's going to be a great course. But a REQUEST -- For future tutorials, please use larger fonts / zoomed-in windows & terminals (just a wee-bit larger would help tremendously). After 30 mins of eye-strain, I started to get a headache. Also, a dark theme (as available in jupyter lab and vscode) would also help.

@santiagomartinez3417 9 ай бұрын

A dark theme would help to make us feel depressed. No, thank you.

@samarthd.n.2314 9 ай бұрын

@@santiagomartinez3417 You aren't a real programmer if you use light mode

@manan-543 7 ай бұрын

@@santiagomartinez3417dark theme ruins your eyes. Try working 10 hours with a white screen. Sheesh

@limageur 6 ай бұрын

Dark theme make your eye force...

@JohnLauerGplus 5 ай бұрын

At what point in the tutorial do you ask the LLM a question and it returns the Wizard of Oz text you trained in the model? I can't find it. I just want to see how well it worked. I've listened to 20 minutes at the start and will finish the whole thing, but curious if anyone knows. None of the chapters say something like "Final Testing to Show it Works."

@AlaskaJiuJitsu 10 ай бұрын

Putting this in saved for when I get a neurolink chip 🍟 🙌

@TheRealNickG 10 ай бұрын

Why? Neuralink isn't like The Matrix, "now I know kung fu...." That isn't a thing. I mean if you mean access to the video through a mere thought, go for it. But there is no world, even with all of the magic happening right now, where storing the information is the same as the experience of knowing that information.

@AlaskaJiuJitsu 10 ай бұрын

@@TheRealNickG No, but using no code tools to take ver the world and learning bit by bit, (like I am currently doing) - is much more efficient than starting this today for me. I saved the video, - I am a BJJ brown belt, I know that nothing will replace thousands of hours and gallons of blood* not to mention multiple combat deployments as an Infantry Paratrooper, I am just simply using echelon of tools for time management. Peace be upon and what not - enjoy your negative Kafka esque rhetoric somewhere else lol.

@xaviermagnus8310 10 ай бұрын

@@TheRealNickGBut with enough knowledge of brain data encoding.. it could be. What would stop the potential to overwrite with ai?

@echofloripa 10 ай бұрын

@@TheRealNickGnever say never 😉

@TheRealNickG 10 ай бұрын

I mean it's really not an argument. Maybe it is "possible" from a technical standpoint but you're still ignoring the warning I'm trying to send. How are you gonna know how to take a punch if you've never experienced it before? Only by the injection of a synthetic experience of someone else. Deep Space Nine had an episode where these aliens made a dude experience a whole prison sentence in an instant. It almost drove him crazy because they pointed out that he might have looked asleep for only a moment, but 20 years still took its toll. You're still under the naive impression that math is easier than martial arts and can't explain why you can't learn calculus by just watching videos on double speed...... It's because there is no such thing as instant learning and there will never be a universe where learning is different than spending legitimate time with a subject in a way that changes you as a person.

@alkebabish 9 ай бұрын

The code behind these LLMs is not as complicated as it may seem, the issue is the millions of dollars worth of GPU hours you would need to train a big model. But imagine when that gets cheaper, things are going to get very competitive and interesting in the world of LLMs. This course is the next one on my list, I'm half-way through the FCC courses on Pytorch and OpenAPI at the moment and I think I have to finish at least one of them before starting another. Amazing courses, better than paid ones every time.

@koomloom 9 ай бұрын

Share the link of the two courses you mentioned please.

@waynelast1685 7 ай бұрын

Yes please

@annansm4293 7 ай бұрын

Isn’t it possible to use gpu cloud?

@flotzdrue4770 7 ай бұрын

@@annansm4293 of course, thats probably what they meant, but that costs alot of money too.

@vindowmaker5819 6 ай бұрын

@@annansm4293 that what he's talking about went he said "...GPU hours(cloud gpu)"

@321123580 10 ай бұрын

Thank you for sharing this information with us ❤

@biddlea76 10 ай бұрын

46:00 np.multiply is elementwise multiplication which isn't comparable to dot product multiplication. To compare gpu to cpu you can use torch's @ multiply for both. Since the second two were not loaded to the gpu, they are computed by the cpu.

@georgehenrique2560 10 ай бұрын

Yup. That was why the results of CPU were close to GPU. Element-wise are way quicker.

@devangpagare976 9 ай бұрын

Wait, can you elaborate a bit. Was he suppose to load the numpy's matrix into GPU as well? @@georgehenrique2560

@GetRealwithMike 6 ай бұрын

I get a io.h C1083 file error when I try to instal pylzma, It can't find the file and I've doe everything to make it work. Any idea how to fix this?

@anjanavabiswas8835 6 ай бұрын

Hadamard Product, maybe?

@aqynbc 6 ай бұрын

Simply gold. Really impressed.

@jonathanrebelo6852 Ай бұрын

I don't know how you did it.. but you somehow explained Transformers in a way this absolute python newbie could parse and understand. Great work and I'm glad I spent the time to follow along with this tutorial.

@teddysalas3590 8 ай бұрын

Wonderful Tutorial !!!✨

@bas_abhi 8 ай бұрын

For MAC users(this is what I did): 1. python -m venv VirtualEnv 2. pip install matplotlib numpy pylzma ipykernel jupyter 3. pip3 install torch torchvision torchaudio 4. Create the kernel: python -m ipykernel install --user --name=gpu_kernel --display-name "gpu kernel" xcode-select --install if you get a ERROR: Failed building wheel for cffi while installing libraries Inside the notebook: device = 'mps' if torch.backends.mps.is_available() else 'cpu'

@pratyushpankaj1680 8 ай бұрын

Thank you so much

@LexFriedmanClips 7 ай бұрын

it's so funny, gpt 4 helped me thru that entire thing lmaooo

@DK-ox7ze 6 ай бұрын

Does this work on intel Macbook Pro having Intel's integrated GPU?

@amriteshamrit7128 Ай бұрын

I am not able to install pylmza any solution it says wheel error

@ahmadshabaz2724 10 ай бұрын

Awesome work man. Lots of appreciation. Keep doing good stuff .

@Zelousfear 4 ай бұрын

I have a hack-a--thon tomorrow and I wanted to make a chat bot from some custom material. Bless you guys! This is perfect!

@RealRex 10 ай бұрын

Thank you so much sharing this knowledge with us.

@martin_nav 10 ай бұрын

Perfect timing. I was looking for tutorial for something like this today.

@elliotarledge 10 ай бұрын

Glad I could get this timing for ya!

@sirabhop.s Ай бұрын

I am so exited when he start explaining the multi-head attention, thank you.

@rubenfranciscoarterobanare4862 10 ай бұрын

This course is insanely incredible and the only con that I can put is that the voice is not clearly at all (not because of the mic just because of the way he speaks), anyway I totally recomend it and it's a really incredible contribution to the comunity, thank you so much

@rubenfranciscoarterobanare4862 10 ай бұрын

2:32:36 diselooooooooooo

@divyansh6451 10 ай бұрын

loved this!!

@pandithammultilingualcompu1552 7 ай бұрын

Awesome video , well explained. You have a great future in IT research

@magus3695 7 ай бұрын

damn , you just challenged me . am also studying bsc math but also am into cybersec . thanks mahn for real

@mmkamalraj8931 6 ай бұрын

Hats off to this awesome dude... Whether its math or pytorch methods the explanations are spot on. GPT is no longer some super awesome AI... its just a bunch of Pytorch code. Completed the entire video in single sitting, tonnes of information and amazing tools to gain intuition

@ericschrdr 2 ай бұрын

This huge sharing - thank you 🙏🏻🤙🏻

@buttert5091 9 ай бұрын

Thanks, learnt a lot

@steveellison8686 Ай бұрын

Great job. You have an excellent communication style. Thanks so much.

@nothing_is_real_0000 7 ай бұрын

Awesome, buddy!! This is great service to AI and tech community and you're sharing for free!!!

@nireph 6 ай бұрын

Appreciate the work you put into this, you definetly know what you are talking about. Thanks! But this course also has a couple of methodical downsides which makes it hard to indepently follow: - your "drawings" could be better, I understand the limitations of "drawing with a mouse on the screen", I suggest to prepare those charts in beforehand - please try to establish a clean coding style, don't use shortend var names like i, x, y, q, wei and so on or dummys like "for _ in range(foobar)" - this makes it really hard to reproduce your code, of course, while in the process it's easier and quicker, but if you get back to the code after 1 month vacation: your lost - promise! - also please use common vocab, like x is usuall X, because it's a tensor, while y is lowercase, showing it's a vector or even better: use "features" and "labels" as var names, epoch instead of iters - sometimes your recorded screen is to small, content is being cut off or the content is too small and barely readable - there are some jumps in your video, like: you showed the simple bigram model, but you did not really finish the work, we trained the model and then what? your notebook has two cells printing some gibberish text, without any explanation. - please try to establish a naming convention for your source files, it's hard to follow A really, like really, good course about Pytorch does all those things, it's Daniel Bourkes intro to Pythorch also on "fcc", highly recommended: kzbin.info/www/bejne/kJDMnHeintKBbKM Again: Appreciate the hard work, please don't get me wrong, it's hard to criticise somethiing that is for free, just want to give back honest and constructive feedback =)

@Tripp111 6 ай бұрын

I like your style. Well done, Caballero. And, thank you.

@JoseRobertoGonzalez 10 ай бұрын

This channel is a gift for humanity

@dociler 2 ай бұрын

Good coarse! Thanks Elliot!

@ninjacodertech 10 ай бұрын

if you have an apple silicon mac you can use "mps" (mac gpu) rather than "cuda" or "cpu". this requires a little bit of extra setup but is quite simple.

@contadacarta 8 ай бұрын

Can you point me to some documentation on how to set this up? I have an M1 and since it does not support cuda I am a bit lost

@RobSchatz 6 ай бұрын

@@contadacarta I'm also interested in a mac set up, getting a lot of 'command not found'

@LannisterCoC 6 ай бұрын

Just replace his cuda line with the following lines, and your code will work with Windows CUDA or Mac M1 MPS GPUs device = "cuda" if torch.cuda.is_available() else "cpu" device = "mps" if torch.backends.mps.is_available() and torch.backends.mps.is_built() else device

@naehalmulazim 2 ай бұрын

Thank you! But tensorflow still does not fully support mac.

@amansahani2001 10 ай бұрын

You guys are Insane. I was implementing this thing from scratch but failed.

@elliotarledge 10 ай бұрын

Glad this course helped you out!

@krinodagamer6313 8 ай бұрын

Im doing this now, looking up informational vids thank you

@lisastreet1398 3 ай бұрын

Nice video! thanks a lot. 💫

@ProKaindra Ай бұрын

Just amazing works. Merci beaucoup !

@SanjayRoy-vz5ih 10 ай бұрын

This is gem for whoever want to know transformer model. Bingo!!👏👏

@ThatsMistaTwistToYou 6 ай бұрын

At approx 40:20, the matrix with the ones on the diagonal and the rest being zeroes is called an Identity matrix - I'm guessing that's why the torch method is called "eye"? Gives you the idea of what the method does but saves keystrokes :)

@jonathan10543 10 ай бұрын

Great job, thanks! I learned a ton

@poojansoni497 3 ай бұрын

Really great explanation

@adamhori4883 4 ай бұрын

Really hooked dude you can be serious if this course is good the you're a god in teaching computer programming

@the-ghost-in-the-machine1108 8 ай бұрын

nice overview, thanks

@sumitmamoria 10 ай бұрын

Very good video. It would have been even better if one could see some meaningful responses in the output.

@danielabraham3022 9 ай бұрын

Would be amazing if the time stamps could be made into chapters. Other than that, thanks for the wonderful video tutorial!

@judevector 10 ай бұрын

Wow wow this is just an amazing time for me to delve deeper into AI coding

@mac.ignacio 10 ай бұрын

Booking this for future reference

@Quinten7771 4 ай бұрын

Awesome video thank you.

@kavorka8855 8 ай бұрын

I am watching this, at 3rd minute, and this confidant young lecturer convinced me to continue watching! :D

@datahacker1405 8 ай бұрын

People debating over this course being a scratch or not are funny. If you complete it you will learn alot and it will also help you to understand research papers and implement them.

@rajdeepmajumdar7323 10 ай бұрын

Free code camp releasing fire 🔥 tutorials these days

@techstartup2670 7 ай бұрын

thanks bro your dedication.

@letsmotivate-ii1vq 10 ай бұрын

Nice job Steve

@elliotarledge 10 ай бұрын

Lets goooo!

@tosinharold 10 ай бұрын

This is a good stuff!!!❤❤❤

@winnerleparadoxe6496 10 ай бұрын

Am so excited

@slowedreverb6819 8 ай бұрын

40:21 that matrix is called a diagonal matrix , where the diagonal element is not zero and rest of elements are zero

@School_of_Technology 5 ай бұрын

1 hour 30mins done - took 3 hours to complete worth every second

@zaursamedov8906 10 ай бұрын

Elliot brother congrats bud

@xugefu 7 ай бұрын

Thanks!

@ivanseredkin1090 10 ай бұрын

wow! thank you!

@OscarSierraLima 8 ай бұрын

this tutorial is great, thanks for posting. can i use the model you work through here for summarization?

@ronitakhariya4094 10 ай бұрын

are you guys gods? 🙇Huge respect as always

@dochuong3002 10 ай бұрын

Thank you !

@hipertracker 4 ай бұрын

You can run Pythonic functions in Mojo but the call has to be wrapped with try-except. E.g. def add(a: Int, b: Int) -> Int: return a + b fn main(): try: var c = add(3,4) print(c) except: print("Error")

@lncsaurabh 2 ай бұрын

Wow. This kid is brilliant 🙏

@adityanjsg99 6 ай бұрын

Il learnt from this channel , What I could not from college and those 000 costing courses !

@nikosterizakis 4 ай бұрын

Mistake in 1:46' - When you create a class that uses nn.Module you HAVE to define the 'forward' method. It is in the Torch docs for nn.

@ShadowMind312 10 ай бұрын

I wish this lecture was up when I took my Text Mining course, lol

@elliotarledge 10 ай бұрын

text mining courses are indeed quite rare to find, glad you found it helpful!

@kevinscaria 9 ай бұрын

In the head class, attention_scores = attention_scores.masked_fill(self.tril[:T, :T] == 0, float('-inf')) Why are you indexing the register buffer containing torch.tril with [:T, :T]? Won't the shape of torch.tril be by default having the shape of the timestep dimension which is the block_size already?

@REDULE26 3 ай бұрын

I would love to see a tutorial about creating an image generation model ❤

@AyushSharma-ug9ni 10 ай бұрын

Great video for intermediate guys ☺️... though freshies will have a hard time 😅

@zilogadabrox 7 ай бұрын

Great course and much appreciated. Resolution of command line is however grainy & hard to read. Any chance of an HD re-cut?

@user-jb1wt6cn3n 4 ай бұрын

nice course :) noone noticed the miscalculation around 1:25:12 the dotproduct of (3x7)+(4x10) should be 61 not 47 but the course is amazing just a little smile that everyone can make failures

@Dave-nz5jf 9 ай бұрын

this is good for detail .. but for an overall understanding of what you're doing its a little tough to follow. If this user followed the pattern of tell them what you're going to do, do it, and then tell them what's they've done, this would be much easier to consume. But still a great effort.

@optimbro 10 ай бұрын

I think I can't do it, but thanks for sharing!

@Canadainfo 6 ай бұрын

beautiful - nicer explanination based on Andre Karpathy

@ZeriumYT 6 ай бұрын

i just have one question at the end i see generated text and even from my basic understanding i can tell that that is a good thing however the generated text is non readable i did not see the whole 5 hour thing so i dont know if it was covered but how would one go about getting readable text is it just by giving more input questions or by increasing the batch amount

@adamgilchristjoy5607 8 ай бұрын

Any idea on how to implement this trained model for useful applications like chatbots etc?

@abhisheksharma2305 3 ай бұрын

I have started learning this today I am dedicated to learn these concepts and apply it in next 15 days! Please follow up If I dont complete it.

@user-vi8ig1dw2u 10 ай бұрын

i request from the freecodecamp that please make a complete hardware engineering course

@Hugo_Youtube 10 ай бұрын

Hows the hardware requirement ? Any minimum CUDA requirement?

@VolodymyrInTech 10 ай бұрын

💛💙👍This is a good stuff!!

@philharmonicaz 8 ай бұрын

So I'm stuck already at the beginning. Using macOS, installed Anaconda, there doesn't seem to be a CLI like what you're using. I don't see an installed Anaconda prompt or a way to access it from macOS term.