GPT-2 from Scratch in C (Day 1/2)

Рет қаралды 39,604

Күн бұрын

Пікірлер: 95

@DeepSeek-R1 Ай бұрын

Timestamps 00:07 - Creating a reimplemented GPT-2 from scratch in C without using libraries 03:25 - Begin basic math operations and data set exploration in C. 10:19 - Transforming input into a 768-dimensional embedding space in GPT-2. 13:44 - Introduction to embedding and position embeddings in GPT-2 sequence processing 20:18 - Layer normalization essential in Transformer blocks 23:43 - Explanation of layer normalization, attention block, residual connections, and MLP in GPT-2 architecture. 30:37 - Introduction to the attention block in GPT-2 34:08 - Overview of attention heads and dimensionality transformation 40:39 - Explaining the mechanism of attention blocks in GPT-2 model. 43:47 - Overview of attention computation in GPT-2 from Scratch in C 50:34 - Discussing function graphs and transformations in GPT-2 programming 53:52 - Model architecture overview & Python implementation planning 1:00:02 - Conversion between tokens and original string in GPT-2 1:03:11 - Introduction to saving weights in a safe tensor file format 1:09:27 - Creating torch tensor from a byte array and reshaping it 1:13:01 - Transformation of 50,000-dimensional vector into 768 dimensions explained. 1:19:05 - Training sequence prediction model for token prediction 1:22:02 - Exploring matrix dimensions and tensor usage in embedding functions. 1:28:26 - Normalization process in GPT-2 model 1:31:54 - Explanation of linear layers and their components 1:38:33 - Splitting the matrix into Q, K, and V parts for 12 attention heads. 1:41:55 - Splitting Q Matrix and Matrix Multiplication 1:48:37 - Implementing attention masking before softmax calculation 1:51:45 - Setting elements in a Matrix to negative infinity 1:58:28 - Performing weighted sum of V vectors with attention C projection matrix. 2:01:31 - Explaining matrix multiplication in C for GPT-2 from Scratch 2:08:29 - Implementing the linear layer and residual connection in the attention block. 2:11:53 - Describing the Layer Norm and Multi-Layer Perception block 2:18:51 - Implementing multiple layers with input/output connections in a for loop. 2:22:13 - Implementing key components of attention mechanism in GPT-2 model. 2:28:54 - Generating an output matrix based on tokens in the sequence. 2:33:00 - Debugging issues with layer indexing and normalization in model implementation 2:39:27 - Implementing GPT-2 in C involves cross-entropy loss evaluation 2:42:16 - Transition to handling data set IO in C from Python implementation 2:48:10 - Memory allocation preprocessing for data structure in C 2:51:28 - Creating a struct in C to store offset and size of a string 2:57:47 - Writing UTF-8 encoded struct members to a file 3:01:09 - Exploring data manipulation in C and compiling the program 3:07:49 - Discussing memory allocation and file reading strategy 3:11:13 - Explanation of struct allocation with offsets and pointers 3:17:49 - Printing strings in C using offset and size 3:21:44 - Discussing differences in struct size alignment 3:28:12 - Pack and write unsigned shorts to a file in C 3:31:44 - Array of 16-bit unsigned integers used as tokens for file processing. 3:37:51 - Aligning data structures and testing for alignment accuracy. 3:42:02 - Decoding tokens back into strings using an encoding table. 3:48:52 - Optimizing data type consistency for printing operations 3:52:16 - Extracting relevant info from JSON with predictable format 3:58:48 - Working with pointers in C for data offsets 4:02:16 - Extracting start and end offsets from a Json string and converting them to integer values. 4:09:49 - Defining expected size for a tensor 4:13:26 - Testing weight embedding matrix 4:19:26 - Structuring parameters for model in C programming. 4:22:35 - Calculating parameter size based on JSON definition and file structure. 4:29:10 - Discussing token encoding weight matrix in parameter manipulation 4:32:20 - Preparing for matrix transformations in the code. 4:39:18 - Setting up matrices and weights for neural network layers. 4:43:01 - Setting up model parameters and proceeding towards calculations 4:49:23 - Creating pointers to select rows in matrices 4:52:56 - Summing elements of vectors using C code 4:59:41 - Implementing mean and standard deviation calculations in C for GPT-2 model initialization 5:03:04 - Explaining the calculation of variance and standard deviation. 5:09:32 - Implementing attention block in C for GPT-2 5:13:28 - Matrix multiplication in GPT-2 transformation process 5:20:01 - Incrementing the output vector based on weight matrix calculations 5:23:05 - Incrementing values in matrix multiplication. 5:29:31 - Calculating dot product of vectors in C implementation 5:32:53 - Implementation of Q and K vectors in dot product loop 5:40:24 - Implementing Softmax operation for probability values. 5:43:44 - Calculating Softmax in C 5:50:21 - Implementing the V Vector multiplication to generate an output Vector. 5:53:47 - Implementing weighted sum calculations efficiently. 6:00:58 - Identifying and addressing errors in the coding process 6:04:22 - Implementing Softmax function with max value initialization 6:11:47 - Introduction to the attention block and multiplication with the c projection matrix 6:14:51 - Introduction to key components of data processing in C 6:22:45 - Implementing FC call in C for GPT-2 neural network training. 6:25:45 - Implementation of attention block in C from GPT-2 tutorial.

@aleksey2960 Ай бұрын

Wow this looks long and hard. Will put these in my “to do” playlists. Also just subbed cos your content looks awesome 🤩

@MCroppered 29 күн бұрын

Is your todo playlist also in the 1000s of hours like mine? 😂

@vpn740 29 күн бұрын

@@MCroppered yours is only 1000s of hours? 😄

@aleksey2960 28 күн бұрын

@ yeah my “watch later” playlist had like 1000+ videos in it lol 😂

@harshsharma03 28 күн бұрын

that's what she said

@4kumetsu 24 күн бұрын

pause

@s-codes14 29 күн бұрын

GigaChad

@hypermeero4782 Ай бұрын

i sometimes wonder if i'll ever be that smart

@JJGhostHunters 29 күн бұрын

Don't be so hard on yourself. It is not always "smart" to recreate the wheel when you have deadlines, life and property that may be dependent upon the stability of a machine learning based system.

@ChristosChristides 28 күн бұрын

@@JJGhostHunters Bro is casually spitting philosophy on a random comment on a random video

@gatogordo4131 28 күн бұрын

But that’s the best part, the endless path of knowledge

@1n4f4bl3 28 күн бұрын

Building something that already exists is not necessarily smart. However, if it helps this person get an AI job, he might end up overprepared and working under others who know less but hold higher positions. In that sense, it’s not smart at all.. it’s actually quite foolish, but he shows a naive young generosity.

@yotelolailo 27 күн бұрын

@@1n4f4bl3 Not everything has to be done to get a job. Some people just like learning and build from scratch just for the pleasure of learning or as a challenge.

@actualBIAS 25 күн бұрын

THIS IS THE CONTENT I VE BEEN LOOKING FOR! Haha like Andrej! I love this!

@ibrahimnaser5233 22 күн бұрын

most basic project required for new grad in 2025

@speedybonsky 25 күн бұрын

Premium mental insanity 🔥🔥🔥

@nicholaskomsa1777 26 күн бұрын

Hello, I think my comments get deleted, but thank you, I have seen 90% and plan to experience it all much more, while writing in C++. I appreciate your data-oriented-c much, the next part in this video is on my radar.

@florianionescu185 27 күн бұрын

nah, I'll just continue watching netflix, thank you.

@pkmx-um9vb 27 күн бұрын

🤣

@Abhijith-e5t 15 күн бұрын

Big L haha

@someone-x64 Ай бұрын

this is going to be so much fun.

@omarwagmes Ай бұрын

Thank you for this video love from Morocco

@PauloDutra Ай бұрын

Bro, this is insane... Thanks

@ditdit-dahdah-ditdit-dah 24 күн бұрын

Looks good. Keep it up . Waiting for more.

@qaqkirby9781 28 күн бұрын

thank you for your sharing. love from china

@gatogordo4131 29 күн бұрын

Amazing, already subscribed. thanks for sharing this!

@wodniktoja8452 Ай бұрын

cant wait to see second part :)))

@MrManGuyManGuy 6 күн бұрын

Now time for DeepSeek in C 😁

@danielhemmati 29 күн бұрын

i thought it's andrej karpathy

@AmanKumar-jk1qu 27 күн бұрын

love you please do more such videos

@mamitianasolofo723 26 күн бұрын

Damn it, I do not have time for all of those good things :'(

@antixon 29 күн бұрын

Thanks for sharing!😊

@AntonioLopez8888 25 күн бұрын

This is really what work-life balance means for a dev

@Jemal-k2u 20 күн бұрын

sigma

@mahmoudabdelsattar8860 27 күн бұрын

what is you specialty, this vid was great

@abrarmasumabir3809 Ай бұрын

keep doing the good work.

@pedrogutierrez1430 27 күн бұрын

Legend

@KshitijKaushik-w2g 28 күн бұрын

this is crazy af

@ffaheem 19 күн бұрын

what's his resume like? I bet this is one of bro side project

@yashsehgal1410 29 күн бұрын

Pop me some popcorn and I am good to go

@rssszz7208 23 күн бұрын

Am learning cpp this gose over my head😅

@ogulcan2877 Ай бұрын

what is this bro i understand nothin

@ashleigh3021 28 күн бұрын

Negative IQ

@DriftJunkie 28 күн бұрын

Blqck magic

@user-plgmgrs326 28 күн бұрын

english please 🥺

@eyzake 25 күн бұрын

i am 30 minutes in, kinda remembered this comment (about a year agoi had a math basics for ai course) i did understand the math but rest is going over my head. man i am only here for c but i cant seeeeeeee

@a-medee 6 күн бұрын

@@eyzake you can't C

@mbarrio Ай бұрын

What is that text editor? SublimeText?

@joefitahiana 29 күн бұрын

i think, it's zed editor

@html69 27 күн бұрын

its the last important thing

@himalayo 26 күн бұрын

@@html69 it does look pretty and comfy, though

@philtoa334 22 күн бұрын

Nice.

@anren7445 28 күн бұрын

madlad

@SyedAsifShah-w4h 4 күн бұрын

Please Code Editor name

@lmnottryhard5626 Ай бұрын

Bro legend

@pavelyankouski4913 27 күн бұрын

I would start with the application interface. Then I would use Gemini

@benjaminryan9354 25 күн бұрын

Does anybody know what IDE/text editor he is using?

@ffaheem 19 күн бұрын

such monster really exist. don't they?

@ibrahimnaser5233 22 күн бұрын

what is your background

@skope2055 Ай бұрын

Cool video :)

@uptimehalil 29 күн бұрын

I think rocket science is easier 😅

@BURN-ADDiCT 28 күн бұрын

As rocket science, I can confirm I am easier

@richardappow6770 Ай бұрын

what drawing app are you using here?

@msakg 25 күн бұрын

Notability

@shakilkhan4306 24 күн бұрын

Awesome bro.. ❤

@mahmoudabdelsattar8860 27 күн бұрын

@na50r24 Ай бұрын

I haven't watched the whole video but 'plan' to do it eventually. To confirm, are you just trying to get inference to work? There is a video about a guy doing the MNSIT number identification problem in Minecraft with Red Stone. So the weights he trained in Python and then he had to figure out how to implement something that performs the inference in Minecraft with those weights pre-established. Are you doing the same but now in C with GPT-2? Because while prepping for an ML exam, I also kind of got into implementing things frpm scratch and I ended up figuring out a way to implement backpropgation and got it to work to solve the XOR problem. However, what I implemented was a computational graph, so I had to hardcode matrix multiplications. This video made me think I could first try to implement my CompGraph in C which should be doable and then built on top of it from there and figure out how to connect Matrix Multi to my comp graphs, assuming if that's even possible... If I actually manage to get that to work we have a PyTorch-like thing for C (probably already exists but is a good exercise anyway to check your ML understanding on a lower level)

@StatisticallySpeaking1 26 күн бұрын

how is this even possible

@StatisticallySpeaking1 26 күн бұрын

and why are you torturing yourself? Quant dev?

@arunray2986 Ай бұрын

I have no idea what's going on😅

@gr4ytxx433 Ай бұрын

Bro WTF;

@santiagomartinez3417 29 күн бұрын

Python would be too slow right?

@programmingpillars6805 27 күн бұрын

not really , things that are consty in terms on computation are done on cpp even ur using python, bcz the libraries like PyTorch uses cpp in the backend to do the heavy computation ...