GPT-2 from Scratch in C (Day 1/2)

  Рет қаралды 39,604

Raff K

Raff K

Күн бұрын

Пікірлер: 95
@DeepSeek-R1
@DeepSeek-R1 Ай бұрын
Timestamps 00:07 - Creating a reimplemented GPT-2 from scratch in C without using libraries 03:25 - Begin basic math operations and data set exploration in C. 10:19 - Transforming input into a 768-dimensional embedding space in GPT-2. 13:44 - Introduction to embedding and position embeddings in GPT-2 sequence processing 20:18 - Layer normalization essential in Transformer blocks 23:43 - Explanation of layer normalization, attention block, residual connections, and MLP in GPT-2 architecture. 30:37 - Introduction to the attention block in GPT-2 34:08 - Overview of attention heads and dimensionality transformation 40:39 - Explaining the mechanism of attention blocks in GPT-2 model. 43:47 - Overview of attention computation in GPT-2 from Scratch in C 50:34 - Discussing function graphs and transformations in GPT-2 programming 53:52 - Model architecture overview & Python implementation planning 1:00:02 - Conversion between tokens and original string in GPT-2 1:03:11 - Introduction to saving weights in a safe tensor file format 1:09:27 - Creating torch tensor from a byte array and reshaping it 1:13:01 - Transformation of 50,000-dimensional vector into 768 dimensions explained. 1:19:05 - Training sequence prediction model for token prediction 1:22:02 - Exploring matrix dimensions and tensor usage in embedding functions. 1:28:26 - Normalization process in GPT-2 model 1:31:54 - Explanation of linear layers and their components 1:38:33 - Splitting the matrix into Q, K, and V parts for 12 attention heads. 1:41:55 - Splitting Q Matrix and Matrix Multiplication 1:48:37 - Implementing attention masking before softmax calculation 1:51:45 - Setting elements in a Matrix to negative infinity 1:58:28 - Performing weighted sum of V vectors with attention C projection matrix. 2:01:31 - Explaining matrix multiplication in C for GPT-2 from Scratch 2:08:29 - Implementing the linear layer and residual connection in the attention block. 2:11:53 - Describing the Layer Norm and Multi-Layer Perception block 2:18:51 - Implementing multiple layers with input/output connections in a for loop. 2:22:13 - Implementing key components of attention mechanism in GPT-2 model. 2:28:54 - Generating an output matrix based on tokens in the sequence. 2:33:00 - Debugging issues with layer indexing and normalization in model implementation 2:39:27 - Implementing GPT-2 in C involves cross-entropy loss evaluation 2:42:16 - Transition to handling data set IO in C from Python implementation 2:48:10 - Memory allocation preprocessing for data structure in C 2:51:28 - Creating a struct in C to store offset and size of a string 2:57:47 - Writing UTF-8 encoded struct members to a file 3:01:09 - Exploring data manipulation in C and compiling the program 3:07:49 - Discussing memory allocation and file reading strategy 3:11:13 - Explanation of struct allocation with offsets and pointers 3:17:49 - Printing strings in C using offset and size 3:21:44 - Discussing differences in struct size alignment 3:28:12 - Pack and write unsigned shorts to a file in C 3:31:44 - Array of 16-bit unsigned integers used as tokens for file processing. 3:37:51 - Aligning data structures and testing for alignment accuracy. 3:42:02 - Decoding tokens back into strings using an encoding table. 3:48:52 - Optimizing data type consistency for printing operations 3:52:16 - Extracting relevant info from JSON with predictable format 3:58:48 - Working with pointers in C for data offsets 4:02:16 - Extracting start and end offsets from a Json string and converting them to integer values. 4:09:49 - Defining expected size for a tensor 4:13:26 - Testing weight embedding matrix 4:19:26 - Structuring parameters for model in C programming. 4:22:35 - Calculating parameter size based on JSON definition and file structure. 4:29:10 - Discussing token encoding weight matrix in parameter manipulation 4:32:20 - Preparing for matrix transformations in the code. 4:39:18 - Setting up matrices and weights for neural network layers. 4:43:01 - Setting up model parameters and proceeding towards calculations 4:49:23 - Creating pointers to select rows in matrices 4:52:56 - Summing elements of vectors using C code 4:59:41 - Implementing mean and standard deviation calculations in C for GPT-2 model initialization 5:03:04 - Explaining the calculation of variance and standard deviation. 5:09:32 - Implementing attention block in C for GPT-2 5:13:28 - Matrix multiplication in GPT-2 transformation process 5:20:01 - Incrementing the output vector based on weight matrix calculations 5:23:05 - Incrementing values in matrix multiplication. 5:29:31 - Calculating dot product of vectors in C implementation 5:32:53 - Implementation of Q and K vectors in dot product loop 5:40:24 - Implementing Softmax operation for probability values. 5:43:44 - Calculating Softmax in C 5:50:21 - Implementing the V Vector multiplication to generate an output Vector. 5:53:47 - Implementing weighted sum calculations efficiently. 6:00:58 - Identifying and addressing errors in the coding process 6:04:22 - Implementing Softmax function with max value initialization 6:11:47 - Introduction to the attention block and multiplication with the c projection matrix 6:14:51 - Introduction to key components of data processing in C 6:22:45 - Implementing FC call in C for GPT-2 neural network training. 6:25:45 - Implementation of attention block in C from GPT-2 tutorial.
@aleksey2960
@aleksey2960 Ай бұрын
Wow this looks long and hard. Will put these in my “to do” playlists. Also just subbed cos your content looks awesome 🤩
@MCroppered
@MCroppered 29 күн бұрын
Is your todo playlist also in the 1000s of hours like mine? 😂
@vpn740
@vpn740 29 күн бұрын
@@MCroppered yours is only 1000s of hours? 😄
@aleksey2960
@aleksey2960 28 күн бұрын
@ yeah my “watch later” playlist had like 1000+ videos in it lol 😂
@harshsharma03
@harshsharma03 28 күн бұрын
that's what she said
@4kumetsu
@4kumetsu 24 күн бұрын
pause
@s-codes14
@s-codes14 29 күн бұрын
GigaChad
@hypermeero4782
@hypermeero4782 Ай бұрын
i sometimes wonder if i'll ever be that smart
@JJGhostHunters
@JJGhostHunters 29 күн бұрын
Don't be so hard on yourself. It is not always "smart" to recreate the wheel when you have deadlines, life and property that may be dependent upon the stability of a machine learning based system.
@ChristosChristides
@ChristosChristides 28 күн бұрын
@@JJGhostHunters Bro is casually spitting philosophy on a random comment on a random video
@gatogordo4131
@gatogordo4131 28 күн бұрын
But that’s the best part, the endless path of knowledge
@1n4f4bl3
@1n4f4bl3 28 күн бұрын
Building something that already exists is not necessarily smart. However, if it helps this person get an AI job, he might end up overprepared and working under others who know less but hold higher positions. In that sense, it’s not smart at all.. it’s actually quite foolish, but he shows a naive young generosity.
@yotelolailo
@yotelolailo 27 күн бұрын
​@@1n4f4bl3 Not everything has to be done to get a job. Some people just like learning and build from scratch just for the pleasure of learning or as a challenge.
@actualBIAS
@actualBIAS 25 күн бұрын
THIS IS THE CONTENT I VE BEEN LOOKING FOR! Haha like Andrej! I love this!
@ibrahimnaser5233
@ibrahimnaser5233 22 күн бұрын
most basic project required for new grad in 2025
@speedybonsky
@speedybonsky 25 күн бұрын
Premium mental insanity 🔥🔥🔥
@nicholaskomsa1777
@nicholaskomsa1777 26 күн бұрын
Hello, I think my comments get deleted, but thank you, I have seen 90% and plan to experience it all much more, while writing in C++. I appreciate your data-oriented-c much, the next part in this video is on my radar.
@florianionescu185
@florianionescu185 27 күн бұрын
nah, I'll just continue watching netflix, thank you.
@pkmx-um9vb
@pkmx-um9vb 27 күн бұрын
🤣
@Abhijith-e5t
@Abhijith-e5t 15 күн бұрын
Big L haha
@someone-x64
@someone-x64 Ай бұрын
this is going to be so much fun.
@omarwagmes
@omarwagmes Ай бұрын
Thank you for this video love from Morocco
@PauloDutra
@PauloDutra Ай бұрын
Bro, this is insane... Thanks
@ditdit-dahdah-ditdit-dah
@ditdit-dahdah-ditdit-dah 24 күн бұрын
Looks good. Keep it up . Waiting for more.
@qaqkirby9781
@qaqkirby9781 28 күн бұрын
thank you for your sharing. love from china
@gatogordo4131
@gatogordo4131 29 күн бұрын
Amazing, already subscribed. thanks for sharing this!
@wodniktoja8452
@wodniktoja8452 Ай бұрын
cant wait to see second part :)))
@MrManGuyManGuy
@MrManGuyManGuy 6 күн бұрын
Now time for DeepSeek in C 😁
@danielhemmati
@danielhemmati 29 күн бұрын
i thought it's andrej karpathy
@AmanKumar-jk1qu
@AmanKumar-jk1qu 27 күн бұрын
love you please do more such videos
@mamitianasolofo723
@mamitianasolofo723 26 күн бұрын
Damn it, I do not have time for all of those good things :'(
@antixon
@antixon 29 күн бұрын
Thanks for sharing!😊
@AntonioLopez8888
@AntonioLopez8888 25 күн бұрын
This is really what work-life balance means for a dev
@Jemal-k2u
@Jemal-k2u 20 күн бұрын
sigma
@mahmoudabdelsattar8860
@mahmoudabdelsattar8860 27 күн бұрын
what is you specialty, this vid was great
@abrarmasumabir3809
@abrarmasumabir3809 Ай бұрын
keep doing the good work.
@pedrogutierrez1430
@pedrogutierrez1430 27 күн бұрын
Legend
@KshitijKaushik-w2g
@KshitijKaushik-w2g 28 күн бұрын
this is crazy af
@ffaheem
@ffaheem 19 күн бұрын
what's his resume like? I bet this is one of bro side project
@yashsehgal1410
@yashsehgal1410 29 күн бұрын
Pop me some popcorn and I am good to go
@rssszz7208
@rssszz7208 23 күн бұрын
Am learning cpp this gose over my head😅
@ogulcan2877
@ogulcan2877 Ай бұрын
what is this bro i understand nothin
@ashleigh3021
@ashleigh3021 28 күн бұрын
Negative IQ
@DriftJunkie
@DriftJunkie 28 күн бұрын
Blqck magic
@user-plgmgrs326
@user-plgmgrs326 28 күн бұрын
english please 🥺
@eyzake
@eyzake 25 күн бұрын
i am 30 minutes in, kinda remembered this comment (about a year agoi had a math basics for ai course) i did understand the math but rest is going over my head. man i am only here for c but i cant seeeeeeee
@a-medee
@a-medee 6 күн бұрын
​@@eyzake you can't C
@mbarrio
@mbarrio Ай бұрын
What is that text editor? SublimeText?
@joefitahiana
@joefitahiana 29 күн бұрын
i think, it's zed editor
@html69
@html69 27 күн бұрын
its the last important thing
@himalayo
@himalayo 26 күн бұрын
@@html69 it does look pretty and comfy, though
@philtoa334
@philtoa334 22 күн бұрын
Nice.
@anren7445
@anren7445 28 күн бұрын
madlad
@SyedAsifShah-w4h
@SyedAsifShah-w4h 4 күн бұрын
Please Code Editor name
@lmnottryhard5626
@lmnottryhard5626 Ай бұрын
Bro legend
@pavelyankouski4913
@pavelyankouski4913 27 күн бұрын
I would start with the application interface. Then I would use Gemini
@benjaminryan9354
@benjaminryan9354 25 күн бұрын
Does anybody know what IDE/text editor he is using?
@ffaheem
@ffaheem 19 күн бұрын
such monster really exist. don't they?
@ibrahimnaser5233
@ibrahimnaser5233 22 күн бұрын
what is your background
@skope2055
@skope2055 Ай бұрын
Cool video :)
@uptimehalil
@uptimehalil 29 күн бұрын
I think rocket science is easier 😅
@BURN-ADDiCT
@BURN-ADDiCT 28 күн бұрын
As rocket science, I can confirm I am easier
@richardappow6770
@richardappow6770 Ай бұрын
what drawing app are you using here?
@msakg
@msakg 25 күн бұрын
Notability
@shakilkhan4306
@shakilkhan4306 24 күн бұрын
Awesome bro.. ❤
@mahmoudabdelsattar8860
@mahmoudabdelsattar8860 27 күн бұрын
@na50r24
@na50r24 Ай бұрын
I haven't watched the whole video but 'plan' to do it eventually. To confirm, are you just trying to get inference to work? There is a video about a guy doing the MNSIT number identification problem in Minecraft with Red Stone. So the weights he trained in Python and then he had to figure out how to implement something that performs the inference in Minecraft with those weights pre-established. Are you doing the same but now in C with GPT-2? Because while prepping for an ML exam, I also kind of got into implementing things frpm scratch and I ended up figuring out a way to implement backpropgation and got it to work to solve the XOR problem. However, what I implemented was a computational graph, so I had to hardcode matrix multiplications. This video made me think I could first try to implement my CompGraph in C which should be doable and then built on top of it from there and figure out how to connect Matrix Multi to my comp graphs, assuming if that's even possible... If I actually manage to get that to work we have a PyTorch-like thing for C (probably already exists but is a good exercise anyway to check your ML understanding on a lower level)
@StatisticallySpeaking1
@StatisticallySpeaking1 26 күн бұрын
how is this even possible
@StatisticallySpeaking1
@StatisticallySpeaking1 26 күн бұрын
and why are you torturing yourself? Quant dev?
@arunray2986
@arunray2986 Ай бұрын
I have no idea what's going on😅
@gr4ytxx433
@gr4ytxx433 Ай бұрын
Bro WTF;
@santiagomartinez3417
@santiagomartinez3417 29 күн бұрын
Python would be too slow right?
@programmingpillars6805
@programmingpillars6805 27 күн бұрын
not really , things that are consty in terms on computation are done on cpp even ur using python, bcz the libraries like PyTorch uses cpp in the backend to do the heavy computation ...
@emptycode1782
@emptycode1782 Ай бұрын
Wtf
@surajmandal_567
@surajmandal_567 29 күн бұрын
I will come back after 15 days and start the series. 🎉
@Kyoz
@Kyoz Ай бұрын
🤍
@Wggwjzjjxmsk
@Wggwjzjjxmsk 25 күн бұрын
Wtf
@bArda26
@bArda26 24 күн бұрын
weak, you should have built your own transistors first.
@grcreed5480
@grcreed5480 24 күн бұрын
Can anyone tell me what topics to learn to be able to do this ......i know dsa in C
as const: the most underrated TypeScript feature
5:38
Matt Pocock
Рет қаралды 130 М.
why do void* pointers even exist?
8:17
Low Level
Рет қаралды 407 М.
小丑女COCO的审判。#天使 #小丑 #超人不会飞
00:53
超人不会飞
Рет қаралды 16 МЛН
UFC 310 : Рахмонов VS Мачадо Гэрри
05:00
Setanta Sports UFC
Рет қаралды 1,2 МЛН
REAL or FAKE? #beatbox #tiktok
01:03
BeatboxJCOP
Рет қаралды 18 МЛН
Creating a window - Software from Scratch
1:04:12
Muukid
Рет қаралды 196 М.
one year of studying (it was a mistake)
12:51
Jeffrey Codes
Рет қаралды 321 М.
If you’re lazy but ambitious, learn programming
3:49
Mindwear
Рет қаралды 3,6 М.
GPT-2 from Scratch in C (Day 2/2)
6:25:05
Raff K
Рет қаралды 4,8 М.
The True Story of How GPT-2 Became Maximally Lewd
13:54
Rational Animations
Рет қаралды 2,1 МЛН
HashMaps & Dictionaries, Explained Simply
22:44
Nic Barker
Рет қаралды 16 М.
the ONLY way to run Deepseek...
11:59
NetworkChuck
Рет қаралды 650 М.
I Will Not Write Rust Again
7:19
ThePrimeTime
Рет қаралды 280 М.
DeepSeek is a Game Changer for AI - Computerphile
19:58
Computerphile
Рет қаралды 1,3 МЛН
Coding a Web Server in 25 Lines - Computerphile
17:49
Computerphile
Рет қаралды 362 М.
小丑女COCO的审判。#天使 #小丑 #超人不会飞
00:53
超人不会飞
Рет қаралды 16 МЛН